• 0 Posts
  • 13 Comments
Joined 2 years ago
cake
Cake day: September 7th, 2023

help-circle
  • Happens very frequently in gaming communities because european servers often get abbreviated as EU, so people just keep using it to mean europe in other contexts.

    Honestly unsure if the times I’ve used euro is just because someone else did first or because I felt it necessary to distinguish from the EU and didn’t feel like writing “european” (though actually yea, I don’t think I’ve ever seen it used by europeans to mean “Europe”, only “european”).




  • I’m mainly wondering how in the everloving fuck apparently the average guess for “live in new york city” is 30%? Surely that has to be trolls answering 100% skewing the average? The number of nyc residents i have floating around in my head is 20 million for some reason (which as it turns out is already a vast overestimation), which would be around 7% with the 330 million i have floating around for US population (which is pretty close to the real number)

    I know the US education system isn’t great, but surely people at least have some very basic knowledge about their own country?

    The bi thing is almost certainly your bubble. Younger generations (just gonna guess you aren’t beyond your mid thirties at most, if yes then I’d find your experience very surprising) skew more towards expressing non-hetero sexualities already, and being around more ideologically left-wing groups likely skews it heavily. It could also be that some people feel some same-sex attraction, but still identify as hetero.

    The numbers do seem about right in general, bisexuality is a weird thing bc I’m also quite convinced some degree of it is extremely common, but that doesn’t mean all those people identify that way.


  • I think there’s a blurry line here where you can easily train an LLM to just regurgitate the source material by overfitting, and at what point is it “transformative enough”? I think there’s little doubt that current flagship models usually are transformative enough, but that doesn’t apply to everything using the same technology - even though this case will be used as precedence for all of that.

    There’s also another issue in that while safeguards are generally in place, without them llms would be very capable of quoting entire pages at least of popular books. And jailbreaking llms isn’t exactly unheard of. They also at least used to really like just verbatim repeating news articles on obscure topics.

    What I’m mainly getting at is that LLMs can be transformative, but they also can plagiarize. Much like any human could. The question is then, if training LLMs on copyrighted data is allowed, will the company be held accountable when their LLM does plagiarize, the same way a person would be? Or would the better decision be to prohibit training on copyrighted data because actually transforming it meaningfully can not be guaranteed, and copyright holders actually finding these violations is very hard?

    Though idk the case details, if the argument was purely focused on using the material to produce the model, rather than including the ultimate step of outputting text to anyone who asks, it was probably doomed to fail from the start and the decision makes perfect sense. And that doesn’t seem too unlikely to have happened because realizing this would require the lawyer making the case to actually understand what training an LLM does.