The developer of WormGPT is selling access to the chatbot, which can help hackers create malware and phishing attacks, according to email security provider SlashNext.
WormGPT Is a ChatGPT Alternative With ‘No Ethical Boundaries or Limitations’::undefined
As more people post ai generated content online, then future ai will inevitably be trained on ai generated stuff and basically implode (inbreeding kind of thing).
The primary training has already been done. If more is necessary, what researchers will do (and are doing) is use a mix of AI generation to process a bunch of data for training, and AI/human curation to improve it.
But making the models larger only works up to a point. Think of the way our brains work: we have different areas specialising in different things. Speech and music are in a different part than motor skills or abstract reasoning or emotional processing. Now, to improve AI, it’s a question of training an “agent” to be an expert in something, and to communicate with the “general” model that coordinates between expert agents like a digital corpus callosum. The data for this is much narrower and doesn’t come from the general internet.
the thing is, each ai is usually trained from scratch. There isn’t any easy way to reuse the old weights. So the primary training has been done… for the existing models. Future models are not affected by how current ones were trained. They will either have to figure out how to keep ai content out of their datasets, or they would have to stick to current “untainted” datasets.
There is! As long as the model structure doesn’t change, you can reuse the old weights and finetune the model for your desired task. You can also train smaller models based on larger models in a process called “knowledge distillation”. But you’re right: Newer, larger models need to be trained from scratch (as of right now)
But even then it’s not really a problem to keep ai data out of a dataset. As you said: You can just take an earlier version of the data. As someone else suggested you can also add new data that is being curated by humans. If inbreeding actually ever happens remains to be seen ofc. There will be a point in time where we won’t train machines to be like humans anymore, but rather to be whatever is most helpful to a human. And if that incorporates training on other AI data, well then that’s that. Stanford’s Alpaca already showed how ressource effective it can be to fine-tune on other AI data.
The future is uncertain but I don’t think that AI models will just collapse like that
We don’t need new training data to interpret natural language. “Chat” is just one application you can tune your model for, and the data for those is being refined through that human curation I mentioned, rather than collected indiscriminately from the internet. That’s what I mean by “the training is already done”. New models won’t be for chat, they’ll be for genomics and economics and astrophysics, and they’ll be trained on research data from human academicians, not the internet.
Someone made a comment that information may become like pre and post war steel where everything after 2021 is contaminated. You could still use the older models but it would be less relevant over time.
As more people post ai generated content online, then future ai will inevitably be trained on ai generated stuff and basically implode (inbreeding kind of thing).
At least that’s what I’m hoping for
That’s not really how it works, but I hear you.
I don’t think we can bury our heads in the ground and hope AI will just go away, though. The cat is out of the bag.
Don’t worry, we’ll eventually train them to hunt each other so that only the strongest survive. That’s the one that will eventually kill us all.
The primary training has already been done. If more is necessary, what researchers will do (and are doing) is use a mix of AI generation to process a bunch of data for training, and AI/human curation to improve it.
But making the models larger only works up to a point. Think of the way our brains work: we have different areas specialising in different things. Speech and music are in a different part than motor skills or abstract reasoning or emotional processing. Now, to improve AI, it’s a question of training an “agent” to be an expert in something, and to communicate with the “general” model that coordinates between expert agents like a digital corpus callosum. The data for this is much narrower and doesn’t come from the general internet.
the thing is, each ai is usually trained from scratch. There isn’t any easy way to reuse the old weights. So the primary training has been done… for the existing models. Future models are not affected by how current ones were trained. They will either have to figure out how to keep ai content out of their datasets, or they would have to stick to current “untainted” datasets.
There is! As long as the model structure doesn’t change, you can reuse the old weights and finetune the model for your desired task. You can also train smaller models based on larger models in a process called “knowledge distillation”. But you’re right: Newer, larger models need to be trained from scratch (as of right now)
But even then it’s not really a problem to keep ai data out of a dataset. As you said: You can just take an earlier version of the data. As someone else suggested you can also add new data that is being curated by humans. If inbreeding actually ever happens remains to be seen ofc. There will be a point in time where we won’t train machines to be like humans anymore, but rather to be whatever is most helpful to a human. And if that incorporates training on other AI data, well then that’s that. Stanford’s Alpaca already showed how ressource effective it can be to fine-tune on other AI data.
The future is uncertain but I don’t think that AI models will just collapse like that
tl;dr beep boop
We don’t need new training data to interpret natural language. “Chat” is just one application you can tune your model for, and the data for those is being refined through that human curation I mentioned, rather than collected indiscriminately from the internet. That’s what I mean by “the training is already done”. New models won’t be for chat, they’ll be for genomics and economics and astrophysics, and they’ll be trained on research data from human academicians, not the internet.
Corpuses will be sold of all the human-data from pre-AI chatbots. Training will be targeted at 2022-ish and before. Nothing from now will be trusted.
Someone made a comment that information may become like pre and post war steel where everything after 2021 is contaminated. You could still use the older models but it would be less relevant over time.
It’s like the Singularity, except the exact opposite.