Selfhosted & AI - Part 2: The Results

curbstickle@anarchist.nexus · 8 hours ago

Selfhosted & AI - Part 2: The Results

TomAwezome@lemmy.world · 34 minutes ago

Those three tags for promo posts seems like a pretty good compromise, don’t really have any better suggestions for the exact acronyms or tag specific descriptions. I use LLMs for personal and work but I don’t post promotional material about any of it, I think most people using AI for personal side-projects aren’t making promo posts about it either, so already this won’t affect most people. The most vocal users in Lemmy selfhosted are going to downvote the hell out of anything that has an AGENTS.md or a single commit that smells AI-generated regardless of the tags, this will mostly speed up the dogpiling.

curbstickle@anarchist.nexus · 16 minutes ago

regardless of the tags, this will mostly speed up the dogpiling.

Yep.

But single word comments getting posted repeatedly can be removed, so while I don’t think the up/downvotes will change, I think the comment section will.

And maybe those folks who rush to downvote will realize they can just filter out the posts instead. We’ll see how that works out though.

irmadlad@lemmy.world · 6 hours ago

You’re doing a good job herding cats @curbstickle@anarchist.nexus. I don’t envy you.

curbstickle@anarchist.nexus · 7 hours ago

As a thought, if a repo is already using ai-declaration.md or a similar ai disclosure, I think posting a link to that declaration as the reply to the AIP comment should count as the declaration reply, since they are already providing that information.

brucethemoose@lemmy.world · edit-2 6 hours ago

Failure to provide a disclosure after using the tag would mean removing the post. It could be locked, but I would have to assume the majority of the spam-type postings that happened to make it past the rule 7 criteria are the ones who will not provide the requested disclosure. I think it makes for a good filter this way, but please comment if you think otherwise.

Sounds reasonable to me!

I think the major choice is for y’all (the mod team), as enforcing a tagging system is going to increase the moderation workload. Though I guess it would cut back on AI reports, like you said.

I have no recommendations for an existing bot.

…You could use an embeddings model for a little extra automation though.

This is a pre-LLM thing, but basically you could feed a script new untagged posts, use a embeddings model to compare the text of their bodies to a keyword (“AI”?), and spit out a number as a rough “similarity” metric. If it’s above a certain threshold (eg if the post seems AI related), send a message to the moderation team to check it, or maybe even post a rules reminder in the comments.

And FYI, embeddings models are tiny, so it doesn’t need special resources to run or anything.

curbstickle@anarchist.nexus · 6 hours ago

Don’t think I need the model tbh, I’m generally on enough to address the untagged. The annoying part would be making the same comment over and over again (thus the short bit of python)

timochka@lemmy.zip · 4 hours ago

Christ.

When will the forward-planning sub-committee of the AI tagging steering group be meeting? I presume they’re going to need to submit a motion to the ways-and-means council sub-sub-committee first and then maybe we can expect a notice on the procedure to follow for interim planning permission to write a post? Will interim planning permission allow the post to be made (subject to the countersignature of the automated post approval bot) or should it be saved in Drafts and then a separate submission (noting the interim permission and any objections received in the consultation period) be made to the full plenary session of the zoning committee?

Or do we just say “fuck this shit” and find another group?

curbstickle@anarchist.nexus · 3 hours ago

You are more than welcome to offer another option.

I’ll mention:

No tag is generating reports
No tag is causing a bunch of unhelpful comments
No disclosure is generating reports
Too basic of disclosure is generating reports

Please, feel free to provide an option.

I’ll point out that what you’re commenting on specifically applies to promo project posts, and nothing else.

Brkdncr@lemmy.world · 6 hours ago

Sounds like too many rules to me. I’d recommend a “no low effort ai” rule.

Also, AIT is regularly used to abbreviate AI Tool

richmondez@lemdro.id · 19 minutes ago

I think anything over the “assisted” threshold in the OP is low effort and should be dumped.

curbstickle@anarchist.nexus · 6 hours ago

The only ones with extra effort will be promo posts, and this disclosure is regularly requested of them anyway.

You’d also need to define “low effort ai”.

I don’t see that working, sorry.

Brkdncr@lemmy.world · 6 hours ago

Asking people to tag AI, and also have a few different AI tags, and also read more than 3 sentences…mods are going to be busy enforcing the rules.

curbstickle@anarchist.nexus · 6 hours ago

That would be me, yes. And considering what I already get reports on, this makes for clear practice and would overall reduce the issues that are currently out there.

brucethemoose@lemmy.world · 6 hours ago

Normally I’d agree, but the tagging rule won’t affect the majority of posts. I think it’s an acceptable complication, in this case.

Especially with how much vibecoded spam is in the horizon.

brucethemoose@lemmy.world · 6 hours ago

Vibecoded spam is deliberately engineered to look “high effort,” so even with the vagueness of such a rule, it wouldn’t cover the spam so well.

Brkdncr@lemmy.world · 6 hours ago

How would the proposed rules help? Isnt spam already covered regardless of AI?

brucethemoose@lemmy.world · edit-2 5 hours ago

Because, with a cursory glance, it doesn’t always look like spam.

A classic example I see starts with “I built a…” in the title, has a wall of text in the description, and actually promises to do something interesting. Only upon deeply inspecting the code (or trying it yourself)… it becomes clear it’s hallucinated nonsense.

And it’s not always malicious, either. A lot of devs get deep in AI psychosis and truly believe they’ve building something revolutionary with their vibe coding agent.

And sometimes these projects are interesting!

Hence it would be EXTREMELY helpful to have this tagged, up front. To me, an [AIP] is gigantic red flag to warrant extra caution, but not necessarily a smoking gun, and would help “regular” homebuilt projects stand out from the vibecoded ones.

And [AIT] is just nice to have. Some users don’t want to see any AI in /c/selfhosted, period. Hence AI discussion posts get reported as spam because people interpret it as spam, and this would clarify that nebulous distinction, while giving those users a way to easily filter AI posts out.

Brkdncr@lemmy.world · 2 hours ago

I wish the mods best of luck with implementing and enforcing this.

AI generally doesn’t need a lot of special handling when it comes to policies. It’s like any other tool, it’s just made it a lot easier for people that don’t know how to code get something made.

If anything, it might be easier for people to tag their level of experience.

richmondez@lemdro.id · 15 minutes ago

Maybe, but even experienced devs seem to want to fall into the trap of thinking their expertise will mean they can skim review AI code and spot it’s mistakes rather than taking the time to properly review and understand the code. Low effort is low 3ffort regardless of your expertise.

brucethemoose@lemmy.world · 2 hours ago

Vibecoded self promo is a growing, specific spam problem though.

And a appreciable fraction of Lemmy/Piefed is “anti AI absolutist.”

I think that’s pretty unique.

Shimitar@downonthestreet.eu · 5 hours ago

Woah …

This is overly complex.

As a dev that sometimes published something, and I don’t vibe code butnl who doesn’t use AI nowadays? That is way too much complex. And zero projects today don’t use AI in any forms blnot even to search or bugfix …

hertg@infosec.pub · 5 hours ago

Then this must come as a surprise to you, but I do not use “AI” whatsoever. Not for coding, not for fixing bugs, and not for coming up with concepts. Crazy right?

Shimitar@downonthestreet.eu · 5 hours ago

Not at all …

You are free to do as you please, and I fully respect that.

I was also a no AI coder, but somehow changed my mind slowly as I learned how to use PROPERLY the tool, which can be quite useful.

Learning how to use it has been fun too, so I suggest you give it a try if you haven’t done so yet.

The first risk is abusing it. The second risk is trusting it. And there are many more risks, but AI is a knife and not a pistol: there are good uses for it, but you must be careful and use it properly all the time.

curbstickle@anarchist.nexus · 5 hours ago

Disclosure itself is a need, and I can confidently say there are enough people who are “no ai ever”, “all ai all the time”, and “only the AI use I agree with” to make something needed.

About the only way to simplify would be to not define the disclosure types, just to disclose it, but then half the post will be discussing where and how if its not defined (along with a bunch of reports about not fully disclosing AI use).

If promo posts included that up front, I don’t think it would be an issue, but its rare that any post includes even “I used/didn’t use AI”, if that.

Shimitar@downonthestreet.eu · 5 hours ago

Disclosure is needed, I agree.

Let’s say it feels complex, and the tags will not avoid the discussion in the comments anyway … but it’s a start so good for it

curbstickle@anarchist.nexus · edit-2 4 hours ago

I’d love an idea to trim it down… but with the wide varieties of ways AI can be used, its hard.

I’m a good example of the “problem” person in a way. I’ll test all kinds of things (including a completely, 100% vibe coded app posted here recently… in a sectioned off vlan of course), but what AI was used for influences where I look. Documentation? Ok, not the worst, but I’m going to check for human review/blatant llm goofs. You used it to figure out how to talk to a serial controlled endpoint? Ok, thats what needs to be checked first.

You made the whole ass thing with Claude? I’ll test it like I said, but I doubt it would ever end up anywhere near my own production use, its more as a curiosity. 99/100 that level of generated is basically the same as calling it unmaintained imo.

So there is definite value to knowing where/how/how much, and if the comments consist of things already stated and just add “slop”, thats going to get deleted, its already disclosed, the people who comment that should filter instead. Its a two way benefit this way as I see it.

That said - I’m always open to options here, but considering recent comments and reports since I’ve taken over moderating, something is definitely needed.

Edit: And just to mention - nothing is ever set in stone, if you’ve not seen my other comments about it. Should anything change, or it becomes unwieldy, or someone finds a workaround to abuse, whatever - its always open for discussion.