As LLMs become the go-to for quick answers, fewer people are posting questions on forums or social media. This shift could make online searches less fruitful in the future, with fewer discussions and solutions available publicly. Imagine troubleshooting a tech issue and finding nothing online because everyone else asked an LLM instead. You do the same, but the LLM only knows the manual, offering no further help. Stuck, you contact tech support, wait weeks for a reply, and the cycle continues—no new training data for LLMs or new pages for search engines to index. Could this lead to a future where both search results and LLMs are less effective?
My 70 year old boss and his 50 year old business partner just today generated a set of instructions for scanning to a thumb drive on a specific model of printer.
They obviously missed the “AI Generated” tag on the Google search and couldn’t figure out why the instructions cited the exact model but told them to press buttons and navigate menus that didn’t exist.
These are average people and they didn’t realize that they were even using ai much less how unreliable it can be.
I think there’s going to be a place for forums to discuss niche problems for as long as ai just means advanced LLM and not actual intelligence.
When diagnosing software related tech problems with proper instructions, there’s always the risk of finding outdated tips. You may be advised to press buttons that no longer exist in the version you’re currently using.
With hardware though, that’s unlikely to happen, as long as the model numbers match. However, when relying on AI generated instructions, anything is possible.
Maybe in the sense that the Internet may become so inundated with AI garbage that the only way to get factual information is by actually reading a book or finding a real person to ask, face to face.
You know how the steel from prenuclear proliferation is prized? I wonder if that’s going to happen with data from before 2022 as well now. Lol.
There might be a way to mitigate that damage. You could categorize the training data by the source. If it’s verified to be written by a human, you could give it a bigger weight. If not, it’s probably contaminated by AI, so give it a smaller weight. Humans still exist, so it’s still possible to obtain clean data. Quantity is still a problem, since these models are really thirsty for data.
LLMs can’t distinguish truth from falsehoods, they only produce output that resembles other output. So they can’t tell the difference between human and AI input.
That’s a problem when you want to automate the curation and annotation process. So far, you could have just dumped all of your data into the model, but that might not be an option in the future, as more and more of the training data was generated by other LLMs.
When that approach stops working, AI companies need to figure out a way to get high quality data, and that’s when it becomes useful to have data that was verified to be written by actual people. This way, an AI doesn’t even need to be able to curate the data, as humans have done that to some extent. You could just prioritize the small amount of verified data while still using the vast amounts of unverified data for training.
Trouble is that ‘quick answers’ mean the LLM took no time to do a thorough search. Could be right or wrong - just by luck.
When you need the details to be verified by trustworthy sources, it’s still do-it-yourself time. If you -don’t- verify, and repeat a wrong answer to someone else, -you- are untrustworthy.
A couple months back I asked GPT a math question (about primes) and it gave me the -completely wrong- answer … ‘none’ … answered as if it had no doubt. It was -so- wrong it hadn’t even tried. I pointed it to the right answer (‘an infinite number’) and to the proof. It then verified that.
A couple of days ago, I asked it the same question … and it was completely wrong again. It hadn’t learned a thing. After some conversation, it told me it couldn’t learn. I’d already figured that out.
Trouble is that ‘quick answers’ mean the LLM took no time to do a thorough search.
LLMs don’t “search”. They essentially provide weighted parrot-answers based on what they’ve seen elsewhere.
If you tell an LLM that the sky is red, they will tell you the sky is red. If you tell them your eyes are the colour of the sky, they will repeat that your eyes are red. LLMs aren’t capable of checking if something is true.
Theyre just really fast parrots with a big vocabulary. And every time they squawk, it burns a tree.
Math problems are a unique challenge for LLMs, often resulting in bizarre mistakes. While an LLM can look up formulas and constants, it usually struggles with applying them correctly. Sort of, like counting the hours in a week, it says it calculates 7*24, which looks good, but somehow the answer is still 10 🤯. Like, WTF? How did that happen? In reality, that specific problem might not be that hard, but the same phenomenon can still be seen in more complicated problems. I could give some other examples too, but this post is long enough as it is.
For reliable results in math-related queries, I find it best to ask the LLM for formulas and values, then perform the calculations myself. The LLM can typically look up information reasonably accurately but will mess up the application. Just use the right tool for the right job, and you’ll be ok.
Is your abuse of the ellipsis and dashes supposed to be ironic? Isn’t that a LLM tell?
I’m not even sure what the (‘phrase’) construct is even meant to imply, but it’s wild. Your abuse of punctuation in general feels like a machine trying to convince us it’s human or a machine transcribing a human’s stream of consciousness.
deleted by creator
No. It hallucinates all the time.
Yes, but search engines will serve you LLM generated slop instead of search results, and sites like Stack Overflow will die due to lack of visitors, so the internet will become a reddit-like useless LLM ridden hellscape completely devoid of any human users, and we’ll have to go back to our grandparents’ old dusty paper encyclopedias.
Eventually, in a decade or two, once the bubble has burst and google, meta, and all those bastards have starved each other to death, we might be able to start rebuilding a new internet, probably reinventing usenet over ad-hoc decentralised wifi networks, but we won’t get far, we’ll die in the global warming wars before we get it to any significant size.
At least some bastards will have made billions out of the scam, though, so there’s that, I suppose. 🤷♂️
Sure does, but somehow many of the answers still work well enough. In many contexts, the hallucinations are only speed bumps, not show stopping disasters.
It told people to put glue in their pizza to make the dough chewy. It’s pretty fucking awful.
Copilot wrote me some code that totally does not work. I pointed out the bug and told it exactly how to fix the problem. It said it fixed it and gave me the exact same buggy trash code again. Yes, it can be pretty awful. LLMs fail in some totally absurd and unexpected ways. On the other hand, it knows the documentation of every function, but somehow still fails at some trivial tasks. It’s just bizarre.
It does this because it inherently hallucinates. It’s just an analytical letter guesser that sounds human because it amalgamates and predicts the next word. It’s just gotten so much input that it can sound human. But it has no concept of right and wrong. Even when you tell it that it’s wrong. It doesn’t understand anything. That’s why it sucks. And that’s why it will always suck. It will not replace search because it makes shit up. I use it for coding here and there as well and it’s just making up functions that don’t exist or attributes functions to packages that aren’t real.
Probably, however I will not be doing that because LLM models are dogshit and hallucinate bullshit half the time. I wouldn’t trust a single fucking thing that a LLM provides.
Fair enough, and that’s actually really good. You’re going to be one of the few who actually go through the trouble of making an account on a forum, ask a single question, and never visit the place after getting the answer. People like you are the reason why the internet has an answer to just about anything.
Haha. Yes I’ll be a tech Boomer. Stuck in my old ways. Although answers on forums are often straight misinformation so really there’s no perfect solution to get answers. You just have to cross check as many sources as possible.
And where does LLM take the answer? Forum and socmed. And if LLM don’t have the actual answer they blabbering like a redditor, and if someone can’t get an accurate answer they start asking forum and socmed.
So no, LLM will not replace human interaction because LLM relies on human interaction. LLM cannot diagnose your car without human first diagnose your car.
And if LLM don’t have the actual answer they blabbering like a redditor, and if someone can’t get an accurate answer they start asking forum and socmed.
LLM’s are completely incapable of giving a correct answer, except by random chance.
They’re extremely good at giving what looks like a correct answer, and convincing their users that it’s correct, though.
When LLMs are the only option, people won’t go elsewhere to look for answers, regardless of how nonsensical or incorrect they are, because the answers will look correct, and we’ll have no way of checking them for correctness.
People will get hurt, of course. And die. (But we won’t hear about it, because the LLM’s won’t talk about it.) And civilization will enter a truly dark age of mindless ignorance.
But that doesn’t matter, because the company will have already got their money, and the line will go up.
They’re extremely good at giving what looks like a correct answer,
Exactly. Sometimes the thing that looks right IS right, and sometimes it’s not. The stochastic parrot doesn’t know the difference
The problem is that the LLMs have stolen all that information, repackaged it in ways that are subtly (or blatantly) false or misleading, and then hidden the real information behind a wall of search results that are entire domains of ai trash. It’s very difficult to even locate the original sources or forums anymore.
I’ve even tried to use Gemini to find a particular YouTube video that matches specific criteria. Unsurprisingly, it gave me a bunch of videos, none of which were even close to what I’m looking for.
That’s true. There could be a balance of sorts. Who knows. If LLMs become increasingly useful, people start using them more. As they loose training data, quality goes down, and people shift back to forums etc. Could work that way too.
to an extent, yes, but not completely
No, because I ignore whatever AI slop comes up when I search for something
I have never found it to be anything other than useless. I will actively search for a qualified answer to my questions, rather than being lazy and relying on the first thing that pops up
To be fair, at the current state search engines work LLMs might not be the worst idea.
I’m looking for the 7800x3d, not 3D shooters, not the 1234x3d, no not the pentium 4, not the 4700rtx. It takes more and more effort to search something, and the first pages show every piece of crap I’m not interested in.
Google made the huge mistake of placing the CEO of adds in charge of search.
And now it fucking sucks.
To be fair, at the current state search engines work LLMs might not be the worst idea.
The current state of search engines is at least partially because the search engine owners have been trying to shove AI down the users’ throats already. Saying “go full LLM” is like saying “hmm, it’s hot in this pan, maybe it’s better to be in the fire underneath”.
The other, perhaps more important, part of search engine corruption is from trying to shove advertising down users’ throats. LLM in search will be twisted into doing the same thing, so that won’t save it either.
The, other, other part is the fact that an increasing percentage of the Internet is made up of walled gardens and web apps that are all but impossible to index, and LLMs can’t help there. Pigboys that run the hard-to-index sites selling the content out from under the users notwithstanding.
Finally, as has been pointed out elsewhere, an LLM can only give an answer based on what was correct yesterday. Or last week. Or a decade ago. Even forums have this problem. Take the fact that unchangeable, “irreplaceable” answers on sites like StackOverflow reflect the state of things when the answers were written, not how things are now, years later.
What I’m worried about are traditional indexers being intentionally nerfed, discontinued, or left unmaintained at best. I’ve often wondered what it would take to self host a personal indexer. I remember a time when search giant Alta Vista had a full text index of the then known internet on their DEC Alpha server(s).
Alta Vista was great!
Now I’m definitely showing my age…
The problem lies with the way the “modern” internet works by loading everything dynamically. Static pages to index are becoming more rare. Also a lot of information is being “lost” in proprietary systems like discord. Those also can’t be indexed (easily)
I think you will be in a loud minority, people don’t like additional work.
Probably
But I don’t see it as work
“Work” is unfucking a situation that I created by being lazy in the first place rather than doing something properly
I’m probably showing my age though…
Even “let me Google that for you” was popular only some years ago. Yes, people are lazy, unthinking hedonists most of the time. In the absence of some sort of strict moral basis, society degenerates because only the tiniest minority will even think about things to try to establish some personal rules.
I still use https://lmgtfy.com/ as a public shame for anyone that can’t be arsed to put in a bit of effort to find something.
You only ignore AI slop when you recognize it as such.
I specifically ignore the google “AI summary”
I also tend to go through the results until I get something from a qualified source.
I’m sure I’m getting some of the aforementioned AI slop, but I would wager that I’m getting better results than the people I know who specifically look for an AI summary.
LLMs are awesome in their knowledge until you start to hear its answers to stuff you already know and makes you wonder if anything was correct.
What they call hallucinations in other areas was called fabulations, to invent tales or stories.
I’m curious about what is the shortest acceptable answer for these things and if something close to “I don’t know” is even an option.
LLMs are awesome in their knowledge until you start to hear its answers to stuff you already know and makes you wonder if anything was correct.
This applies equally well to human-generated answers to stuff.
True, the difference is that with humans it’s usually more public, it is easier for someone to call bullshit. With LLMs the bullshit is served with the intimacy of embarrassing porn so is less likely to see any warnings.
But LLMs truly excel at making their answers look correct. And at convincing their users that they are.
Humans are generally notoriously bad at that kind of thing, especially when our answers are correct.
Humans are generally notoriously bad at that kind of thing
Have you met humans? Many of them base their entire career on this skill.
Sure, but they’re a minority. Millions, at most, out of billions. Probably less than that.
All modern LLMs are as good as professional mentalists at convincing most of their users that they know what they’re saying.
That’s what they’re designed, trained, and selected for. Engagement, not correctness.
Sound similar to betteridges law of headlines.
Im sure there are tricks like adding ‘fact check your response’ but I suspect there is something intrinsic to these models that makes it a super difficult problem.I get the feeling that LLMs are designed to please humans, so uncomfortable answers like “I don’t know” are out of the question.
- This thing is broken. How do I fix it?
- Don’t know. 🤷
- Seriously? I need an answer? Any ideas?
- Nope. You’re screwed. Best of luck to you. Figure it out. I believe in you. ❤️
Not designed, but trained. Training involves rewarding finding answers, so they WILL give you something. “I don’t know” is not going to fare well in the training development, so it naturally gets filtered out, while very creative (but wrong) LLMs do well.
There have been enough times that I googled something, saw the AI answer at the top, and repeated it like gospel. Only to look like a buffoon when we realize the AI was completely wrong.
Now I look right past the AI answer and read the sources it’s pulling from. Then I don’t have to worry about anything misinterpreting the answer.
True, but soon the sources will be AI generated too, in a big GIGO loop.
That’s exactly what I’m worried about happening. What If one day there are hardly any sources left?
At this rate that day is not too distant, I’m affraid.
I was expecting either Huxley or Orwell to be right, not both.
Interestingly, there’s an Intelligence Squared episode that explores that very point. As usual, there’s a debate, voting and both sides had some pretty good arguments. I’m convinced that Orwell and Huxley were correct about certain things. Not the whole picture, but specific parts of it.
Agreed, if we look closely we can find some Bradbury and William Gibson elements in the lovely dystopia we’re currently enjoying.
Oh absolutely. Cyberpunk was meant to feel alien and revolting, but nowadays it is beginning to feel surprisingly familiar. Still revolting though, just like the real world.
If the tech matures enough , potentially !
Not wrong about LLMs (currently )? bad with tech support , but so are search engines lol
People will use whatever method of finding answers that works best for them.
Stuck, you contact tech support, wait weeks for a reply, and the cycle continues
Why didn’t you post a question on a public forum in that scenario? Or, in the future, why wouldn’t the AI search agent itself post a question? If questions need to be asked then there’s nothing stopping them from still being asked.
If you cut a forum’s population by 90% it will die.
This is one of the biggest problems with AI. If it becomes the easiest way to get good answers for most things, it will starve the channels that can answer the things it can’t (including everything new).
Depends which 90%.
It’s ironic that this thread is on the Fediverse, which I’m sure has much less than 10% the population of Reddit or Facebook or such. Is the Fediverse “dead”?
This is one of the biggest problems with AI. If it becomes the easiest way to get good answers for most things
If it’s the easiest way to get good answers for most things, that doesn’t seem like a problem to me. If it isn’t the easiest way to get good answers, then why are people switching to it en mass anyway in this scenario?
I said “cut a forum by 90%”, not “a forum happens to be smaller than another”. Ask ChatGPT if you have trouble with words.
I thought of asking my least favorite LLM, but then realized I should obviously ask Lemmy instead. Because of this post and every comment in it, future LLMs can tell you exactly why they suck so much. I’ve done my part.
That is an option, and undoubtedly some people will continue to do that. It’s just that the number of those people might go down in the future.
Some people like forums and such much more than LLMs, so that number probably won’t go down to zero. It’s just that someone has to write that first answer, so that eventually other people might benefit from it.
What if it’s a very new product and a new problem? Back in the old days, that would translate to the question being asked very quickly in the only place where you can do that - the forums. Nowadays, the first person to even discover the problem might not be the forum type. They might just try all the other methods first, and find nothing of value. That’s the scenario I was mainly thinking of.
I did suggest a possible solution to this - the AI search agent itself could post a question in a forum somewhere if has been unable to find an answer.
This isn’t a feature yet of mainstream AI search agents but I’ve been following development and this sort of thing is already being done by hobbyists. Agentic AI workflows can be a lot more sophisticated than simple “do a search summarize results.” An AI agent could even try to solve the problem itself - reading source code, running tests in a sandbox, and so forth. If it figures out a solution that it didn’t find online, maybe it could even post answers to some of those unanswered forum questions. Assuming the forum doesn’t ban AI of course.
Basically, I think this is a case of extrapolating problems without also extrapolating the possibilities of solutions. Like the old Malthusian scenario, where Malthus projected population growth without also accounting for the fact that as demand for food rises new technologies for making food production more productive would also be developed. We won’t get to a situation where most people are using LLMs for answers without LLMs being good at giving answers.
This idea about automated forum posts and answers could work. However, a human would also need to verify that the generated solution actually solves a problem. There are still some pretty big ifs and buts in this thing, but I assume it could work. I just don’t think current LLMs are quite smart enough yet. It’s a fast moving target, and new capabilities are bing added on a daily basis, so it might not take very long until we get there.
However, a human would also need to verify that the generated solution actually solves a problem.
That’s already an issue with human-generated answers to problems. :)
“Verification” could be done by an AI agent too, though, as I described above. Depends on the sort of problem. A programming solution can be tested in a simple sandbox, a medical solution would require a bit more effort to validate (whether by human or by AI).
I just don’t think current LLMs are quite smart enough yet.
Certainly, we’re both speculating about future developments here.
LLMs are the big block V8 of search engines. They can do things very fast and consume tons of resources with subterranean efficiency. On top of that, they are privacy invasive, easy to use for manipulation and speed up the problem of less mature users being spoon fed. General purpose LLMs need to be outlawed immediately.
prohibition of anything is usually a bad idea
Right. How about csam, incest, cannibalism?
arguments like this are fucking stupid
Glad you agree. Non arguments are not a good idea.
No, your argument is stupid. OF COURSE those things are bad, its stupid to think that’s what I implied.
You made a blanket statement and now you’re angry because someone called you out on it. I get that. But i dont care. Please dont make blanket statements like that. Thats not a good way of debating stuff.
Of course outlawing of stuff is good in certain cases. And LLMs (and AI in general) as a public tool, exploited for profit, isn’t good for humanity. It sucks energy like crazy, produces bullshit results, diseducates people and further benefits the capitalist class.
It’s just not okay to have that. I would have gone with an argument that goes “but how about for personal use on your own computer?” Then I would say I can see that being okay, as long as it doesnt permanently increase everyones personal power usage because that is the same as if you had giant centralized AIs.
See? You can argue against my point without making self defeating statements.
I’m not angry at all. I just think your response is childish.
Silly me, I forgot that running an LLM model was so similar to cannibalism.
Thanks for showing that you have no actual arguments.
LLMs are inherently bad for society in their current form. They have no real benefit. They push capital extraction and further increase the pressure on workers. They have insane energy requirements, insane hardware requirements. We are working on saving our planet and can absolutely not spare the massive amounts of energy required for this shit.
Thanks for showing that you have no actual arguments.
You did it first by jumping to “think of the children!” And analogizing running a program to cannibalism.
They have no real benefit.
No need to ban them, then. Nobody will use them if this is true.
They have insane energy requirements, insane hardware requirements.
I run them locally on my computer, I know this is factually incorrect through direct experience.
Personal experience aside, if running an LLM query really required “insane” energy and hardware expenditures then why are companies like Google so eager to do it for free? These are public companies whose mandates are to generate a profit. Whatever they’re getting out of running those LLM queries must be worth the cost of running them.
We are working on saving our planet
I see you’ve switched from “think of the children!” To “think of the environment!”
You just showed again that you have no actual arguments. You’re using populism to “win” against factually correct and provable statements.
Using anecdotal evidence is a cheap trick and I believe you know it. It’s not evidence at all. Numbers show that I’m right and you’re wrong in this case.
“Think of the children” is used as a thought stopper by the political right to push their laws against humanity through. It isnt as smart as you think to wrongly ascribe it. I was right and showed it, you cant live with it. Thats okay.
Using anecdotal evidence is a cheap trick and I believe you know it. It’s not evidence at all. Numbers show that I’m right and you’re wrong in this case.
So… got any?
“Think of the children” is used as a thought stopper by the political right to push their laws against humanity through.
I refer you back to your earlier comment analogizing LLMs to “csam”.
I haven’t looked into many LLMs, but Microsoft will use your data for training the next version of Copilot. If you’re a paying enterprise customer, then your data won’t be used for that.
I suspect Google is also using every bit of data they can get their hands on. They have a habit of handing out shiny new stuff in exchange for your data. That’s exactly why Android and Chrome don’t require your money.