• 10 Posts
  • 2.59K Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle

  • Anything pro-African is mostly neutral but in essence ignored.

    This is sad.

    I’m American, and I want the Fediverse to be Euro, Africa, South America heavy. Basically anyone but the usual suspects that dominate news. I want to see stuff from other countries.

    I was fortunate enough to get to visit Tanzania, and it was great. It’d be nice if your continent took over the world. Please…


    Anyway, be aware that many of the authoritarian shills actually live in the US, Western Europe or wherever. Some do not, but the bulk seem to.

    They’re just terminally online.

    They don’t know squat about what’s actually going on in North Korea or Iran or China, and they wouldn’t be here if they did because lemmy.ml is banned China, Iran, and obviously North Korea.


    Talk to the Eastern European Fediverse folks.

    They know precisely what the deal is. You do not hear them praising the Soviets, that’s for sure.


  • Other companies are taking advantage of their role as developer/publisher to insert their own launcher to force me to create an account on their service

    Friend, no one is forcing you do anything. Steam isn’t being “taken advantage of.” That’s how Steam sells you the game, and if Valve didn’t like it, they wouldn’t list the game in the first place.

    You don’t like it? Don’t make an account, refund the game.


  • In past crypto busts, Nvidia bought used datacenter GPUs and threw them away, to keep prices of new cards high.

    They will undoubtedly do this again. Probably AMD/Intel too.

    And “FOMO” style crypto mining sites were just abandoned or repurposed AFAIK.


    But honestly, I don’t know what will happen now. Especially to the “quick and dirty” sites like Meta’s server tents, all the supposedly temporary evaporative cooling/gas generators and such.

    Used server GPUs are still pretty good processors for all sorts of things. I would guess that Nvidia pivots towards robotics and“business virtual reality,” kinda like they’re already pivoting towards more utilitarian LLMs with the Nemotron releases, so maybe the surviving GPUs will get used for that.



  • It’s not democracy though.

    Whatever the ideals of cypto are, however user friendly could be made, in reality, it’s just fundamentally too easy to be abused.

    As-is, it’s one of those “it would work fine if everyone learned it in detail, and grifters would go away” ideas, and that’s not going to happen.

    Democracy is fragile and exploitable too, but it has a track record of working across general populations for reasonable lengths of time.


  • I suspect it would work. But the false positive rate would be really high.

    In other words, they could probably detect sloppy junk reasonably well, but I suspect it would flag too many human PRs to make the automation particularly useful.


    That, and the good seeming vibe coded PRs are the ones the worry about. Those are the ones that seem to slot in, but might have an error or general misunderstanding somewhere in them that’s just really hard to detect, as it would be common sense to a human working on the project, but not to an LLM agent.

    As a random specific example, I had a local LLM + Gemini 3.1 fix this issue with a Rimworld mod for me. It was really simple; just changing one line in an XML file.

    But neither of them realized the change was, ultimately, bad practice. They re-defined something inherited from a parent class, which would prevent other mods’ changes in that parent class chain from percolating down to this. Any basic Rimworld modder would know this is a recipe for trouble, but an LLM isn’t cognizant like that and has no clue.

    Now: imagine that, but in a huge PR for a complex codebase.

    It’s just too much to look for. The LLM could make a non-obvious, “inhuman” mistake at any point.


  • Rarely of course, something is so complicated that it actually takes more time to come up with the right code than do a review. But that is only a rare thing.

    This is definitely a thing though.

    On this very topic, many llama.cpp PRs are good examples. A model trainer may present a PR with poor understanding of the (very complicated, highly specialized, sparsely documented) project. Then a maintainer comes to fix it, but has absolutely no knowledge of certain things the model trainer would know (“Oh, the whole thing NaNs if this one value on layer 23 isn’t FP32!”)

    There has to be a back-and-forth. A whole lot of it.


    That is an exception, yeah.

    But I’m not sure I’d call it “rare.” There are definitely situations where fixing without explaining is ultimately a whole lot of work.


  • That doesn’t fix the cost issue, though.

    Basically, with the world’s current demographic trajectory, an absolutely massive chunk of global production needs to be allocated towards such caregivers and the elderly. There’s no way around it.

    And yes, the management structure is completely screwed up, but I’m just saying that’s not seeing the forest through the trees. It masks the bigger issue. If you eliminated every single manager, every ancillary position, every investor or owner, it wouldn’t even be close to enough.



  • I think it will massively correct, like the dotcom bubble for websites. LLMs are a useful utility, but not something that’s going to make economics irrelevant (like people thought about the internet).

    Why? LLMs are tools, text models, not AGI magic lamps, and a couple of con artists are trying to convince the world otherwise. That’s an oversimplification, but the jist of it.

    And I’m no LLM skeptic. I’ve been playing with ML as a hobby for a decade, with local LLMs before ChatGPT was even available, but the market attitude towards all this is absolutely bonkers. It’s worse than crypto.



  • …Because it’s ridiculously labor intensive? And emotionally draining?

    Have yall ever had to take care of someone really old, with failing health? Or really young?


    I’m not saying there aren’t huge structural issues, or leeches on the system. But it’s fundamentally hard. Taking care of just a few others will absolutely drain a professional, and paying them a livable wage + tax/benefits, with no other expenses whatsoever, will drain savings of those taken care of.





  • It’s more complex than that. The weights of big models are distributed, and then tokens are processed in parallel for multiple users. The setup varies, but it could be 8 GPUs serving many dozens of users at once, or bigger sets with even more parallelism.


    I think the bigger problem is that Copilot is… shit.

    It’s probably some ancient, inefficient architecture, not something super sparse and hardware efficient like (say) Deepseek V4, or Kimi 2.6, or Gemini Pro.

    And literally every interesting dev team Microsoft has ever acquired (Phi, WizardLM, many more), and any interesting innovation they figured out, has just disappeared into a black hole.

    They don’t have custom hardware, either, like Huawei NPUs or Cerebras WSEs, or Google TPUs. They’ve written some very interesting papers on that, and proceeded to do squat with them.

    Also, it is AWFUL for its size. Tiny models that are basically free run circles around CoPilot.

    What I’m getting at is that CoPilot is probably the most inefficient LLM out there. Like, it’s impressive how bad it is.


  • I use sigma N sampling at 1.0, a slop phrase banlist, and maybe a little rep penalty.

    Beyond that it depends on the usage.

    For scripts or “questioning a document,” it’s as low as can be until it loops. I start with zero temperature. But I don’t really use Gemma for coding, TBH, and it’s not good for longer documents.

    If it’s for a specific language or a very specific script, I sometimes constrain grammar for the language.

    For more “general” writing, like brainstorming or RP or whatever, I start at around 0.7 with minimal DRY sampling and look at the logit percentages in the Mikupad UI. Especially “important” tokens like names or information recall. If the probability of getting correct answers is too low, I turn the temperature down.

    …But honestly, I tend to use big MoEs instead of Gemma for that, too.


    And if none of this makes any sense…

    Yeah. That’s the problem.

    Sampling was supposed to be a temporary stopgap until looping and such was figured out, but the big LLM devs just never addressed it in production. There are all sorts of interesting papers, including one from Google about sampling logits per-layer, but they don’t implement any of them in the API models.


  • Gemini actually has a really interesting architecture, hence it has fast responses, and it’s easily the best long context model out there.

    And outside of bechmaxxing or pure coding, Gemma is very good for its size. 12B is an incredible multimodal LLm, the only one natively trained for image/text ingestion without a mmproj hacked on at the end.

    …But it sure feels like executive meddling kills it.

    The pattern I see is:

    • Gemini preview is released.

    • It’s genuinely good! It’s smart, it’s straight.

    • Then they “refine” it, it’s gets more and more sycophantic, more deep fried. Long context performance degrades… benchmark scores go up, but anyone who actually uses it can immediately tell it’s gotten worse.

    • Only then, is it released for mass use.

    It’s obvious they took a good model, then enshittified it to make their bosses happy and tech bros in Twitter excited.

    Gemma has the same pattern. Researchers tease the local community, delay it, and then when a new Gemma finally comes out, it turns out to be using some old SWA architecture. And the biggest model is cut. And only a smaller one uses the multimodal training.

    It’s obvious it was neutered to not “threaten” Gemma API or be too “unsafe.”


    Another thing I’ve noticed is that both Gemini and Gemma are awful with their default 1.0 temperature/top-p 0.95. Sampling completely screws them up. But they like low temperature + minp, and Gemma loves constrained sampling.

    But 99% of users don’t know anything about sampling, so that’s going to leave a bad impression.


  • $15k would get you a used AMD server, a 5090 or a set of 3090s, and enough leftover cash for electricity to just run a 1T parameter LLM at home. Plus, it’s yours.

    And that’s hilariously inefficient.

    It’s completely nuts to me that people pay Anthropic per token, at that rate. I think 1 whole year for GLM’s coding plan was a flat $30, or something.