It does discourages the use of unauthorised data. If stealing doesn’t give you competitive advantage, it’s not really worth the risk and cost of stealing it in the first place.
If you did all the work and potentially criminal collection of data, but everyone else gets the benefit as well, that is not an incentive. You underestimate how selfish corporations can be.
OpenAI wouldn’t stay at the forefront of LLM if every competitor gets to use the model they spent money on training.
It wouldn’t be. It would still work. It just wouldn’t be exclusively available to the group that created it-any competitive advantage is lost.
But all of this ignores the real issue - you’re not really punishing the use of unauthorized data. Those who owned that data are still harmed by this.
It does discourages the use of unauthorised data. If stealing doesn’t give you competitive advantage, it’s not really worth the risk and cost of stealing it in the first place.
If you can still use it after you stole it, as opposed to not being able to use it at all… Then it does give you an incentive
If you did all the work and potentially criminal collection of data, but everyone else gets the benefit as well, that is not an incentive. You underestimate how selfish corporations can be.
OpenAI wouldn’t stay at the forefront of LLM if every competitor gets to use the model they spent money on training.