• 1 Post
  • 194 Comments
Joined 3 年前
cake
Cake day: 2023年7月3日

help-circle


  • qqq@lemmy.worldto196@lemmy.blahaj.zoneSlop detectives be like rule
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    2 天前

    It looks like Pangram specifically holds back 4 million documents during training and has a corpus of “out of domain” documents that they test against that didn’t even have the same style as the testing data.

    I’m surprised at how well it does; I really wonder what the model is picking out. I wonder if it’s somehow the same “uncanny valley” signal that we get from AI generated text sometimes.

    To show that our model is able to generalize outside of its training domain, we hold out all email from our training set and evaluate our model on the entire Enron email dataset, which was released publicly as a dataset for researchers following the extrication of the emails of all Enron executives in the legal proceedings in the wake of the company’s collapse.

    Our model with email held out achieves a false positive rate of 0.8% on the Enron email dataset after hard negative mining, compared to our competitors (who may or may not have email in their training sets) which demonstrate a FPR of at least 2%. After generating AI examples based on the Enron emails, we find that our false negative rate is around 2%. We show an overall accuracy of 98% compared to GPTZero and Originality which perform at 89% and 91% respectively.

    and

    We exclude 4 million examples from our training pool as a holdout set to evaluate false positive rates following calibration on the above benchmark.




  • You’re suggesting that we replace THEM with an agent.

    I am not suggesting we replace anyone, least of all the open source community, so let’s not put words in my mouth

    I think the current code I see being generated is generally “good enough”. I’m not comparing it to perfect: I’m comparing it to people.

    If this were true, then open source projects would have much less of an issue with pull requests from sloperators.

    This doesn’t follow to me. A good tool in the hand of a crappy user doesn’t suddenly make good output. I specifically said that LLMs write good code in a specific setting. Clearly random person generating thousands of lines at a time for a project they don’t understand isn’t that setting.

    You seem to be very focused on crappy code generated by people that don’t know what they’re doing, the technology isn’t good enough for that, so yes, it won’t work in that setting, I agree.


  • I’d push back on your point here with a few things:

    The primary one being: the code doesn’t need to be perfect or even above average – average is perfectly fine. The idea here is comparing the AI to a human, not to perfection. I see this constantly with AI and I find it a bit disingenuous.

    I do truly believe what I said above will be possible within my career (I’m in my mid 30s), but it’s not really what I’m worried about right now. I think the current code I see being generated is generally “good enough”. I’m not comparing it to perfect: I’m comparing it to people.

    I read a comment once that still rings true - “Hallucinations” are a misnomer. Everything an LLM puts out is a hallucination; it’s just that a lot of the time, it happens to be accurate. Eliminating that last percentage of inaccurate hallucinations is going to be nearly impossible.

    I don’t see any reason you have to remove all hallucinations to get a good tool for autonomous development: humans aren’t perfect either. We compensate for that with processes and checking each others work, but plenty still falls through the cracks.

    LLMs also have no understanding of context outside the immediate. Satire is completely opaque to them. Sarcasm is lost on them, by and large. And they have no way to differentiate between good and bad output. Or good and bad input, for that matter. Joke pseudocode is just as valid in their training corpus as dire warnings about insecure code.

    Have you seen output in which satirical code is actually included? I’m well aware of things like https://www.anthropic.com/research/small-samples-poison and the potential here. And do you not believe that either (a) these types of trivial issues would be caught by a person whose job was just to audit output or even (b) this type of issue could be caught by specially trained domain limited AIs designed to check output?


  • To your point then: what are your thoughts on this project? https://github.com/anthropics/claudes-c-compiler I’m not particularly interested in this use case right now but it seems more in line with what you’re interested in.

    I think it shows a lot of limitations but also a lot of potential. I don’t personally think the AI needs to get the code perfect on the first go – it has to be compared to humans and we definitely don’t do that.

    I really really dislike the way it’s being sold as a solution for things it’s in no way a solution for.

    Yes, of course. I think it’s important to look passed the blowhards and think about what it’s actually doing: that is the perspective I’m trying to talk about this from.


  • I didn’t say “trust me bro” and showing Claude submissions is sufficient for analyzing code in the context I believe it is good: one file at a time and one task at a time. This is also the same realm that a human is good. You are welcome to look at the project as a whole to determine the “project quality” as well: it’s open source. But I’m not here to argue: I believe this tech that is barely in its infancy is already quite good and going to get better, and I’m already considering what it will do to my life. If you don’t, that’s fine.


    I’ll add here that I find it very frustrating to talk about these “AI agents” and their code output, because it’s something we’re all close to and spent a lot of time learning. The concept of “a machine” getting “better than us” so quickly, with the background context of an industry that is chomping at the bit to replace humans makes these discussions inherently difficult and really emotional. I feel genuine sadness when I think about it. If the world were different we’d probably all be stoked. I don’t want the AI to be better than me, and I currently don’t believe it is, but I think:

    1. My belief doesn’t stop the market. People do believe that it is better than me or at least good enough. This has a real effect on my life and the lives of people I know.
    2. I don’t see any fundamental reason it won’t get better at development. Part of the reason it struggles with large projects is context: that doesn’t sound like a fundamental engineering constraint to me, it sounds like a memory constraint. Specialization will also make it better and better I assume.
    3. Even if it is never better than me, it will certainly be more efficient and eventually the market will consider my time better spent correcting its output or guiding it, removing the fun part of the work in my mind.

    I don’t think my job is currently on the chopping block today: I don’t do development I do security work. But I do think it will either be on the chopping block or fundamentally change sooner than I’m comfortable with.



  • Claude commits to GitHub with the same name no matter who uses it. You can see every single line of open source code it has written (for GitHub only of course): https://github.com/search?q=author%3Aclaude&type=commits&s=author-date&o=desc. Look around as you please, most of it is just fine.

    People that I know to be good developers have also shared their experiences with it and say yes, it has written good code for them. I’ve personally used ChatGPT to generate very mundane tasks and the code it output was more than adequate.

    It introduces security bugs and subtle bugs at probably the same rate as a human (I have no “citation” there, just what I’ve seen). It needs to be “driven” by a human, yes, but it’s not clear for how long it will need to be, and even if it always does, personally I don’t want my job to be to “drive an AI”.




  • I’ve been working in tech for about 10 years now as well, and I’m also just feeling tired. I’m a bit sad, because I like my job. I didn’t study computer science or anything in college; I just got work in security because I enjoyed it. It’s sad pretty much knowing it won’t be the same. I don’t really want to offload a lot of the work to an AI in the future.

    I’ve been getting more into learning to weld and work with wood. In the next few years I’ll probably consider starting a small custom furniture company.

    I feel like this part of the conversation is drowned by the AI hype train and the AI hate train. The part where real people are seeing the real effects of a technology that is actually good, and is likely going to get better, and will have the potential for significant social damage to a large part of the middle class.




  • qqq@lemmy.worldtoPrivacy@lemmy.worldVPN only with ID check in UK?
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    1
    ·
    edit-2
    1 个月前

    I’d take it a step further and say it’s not even a use of a VPN at all. If you want to browse the web anonymously a VPN doesn’t provide that guarantee: it only affects your source IP, which most services probably understand is unreliable for tracking purposes anyway.

    Even for changing your IP to aid in being anonymous on the web, TOR is the network layer tool to use, because you will have a much wider range of source IPs than the single one you’ll get from the VPN, but there is still so much work to do to “browse the web anonymously”.

    I think a lot of people don’t understand VPNs. They’re great privacy tools if you don’t trust the local network or your ISP, as all traffic is typically encrypted and headed for the same server, but being anonymous on the web is way more involved because you are much more than your IP address.

    Btw I’m not replying here thinking you don’t understand all that; just expanding on the conversation