• leftzero@lemmynsfw.com
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    7 hours ago

    there doesn’t seem to be any reason why synthetic data can’t be used for the whole training run

    Ah, of course, it’s LLMs all the way down!

    No, but seriously, you’re aware they’re selling this shit as a replacement for search engines, are you not?

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      1
      ·
      51 minutes ago

      No, it’s not “LLMs all the way down.” Synthetic data is still ultimately built on raw data, it just improves the form that data takes and includes lots of curation steps to filter it for quality.

      I don’t know what you mean by “a replacement for search engines.” LLMs are commonly being used to summarize search engine results, but there’s still a search engine providing it with sources to generate that summary from.