• 0 Posts
  • 66 Comments
Joined 1 year ago
cake
Cake day: July 3rd, 2023

help-circle
  • Glad you got fired. Vaccines should always be mandatory save for legitimate, doctor-validated medical exemptions.

    Anti-vaxxers are fucking stupid and should either be educated properly or, if they still refuse to do their civic duty after being de-programmed of misinformation, punished. You are only allowed to participate in society if you take the necessary steps that you are morally and ethically obligated to do in order to protect it from preventable, transmissible disease. We had eradicated polio until stupid motherfuckers like yourself decided that it would be a good idea to forgo the standard polio vaccine schedule that we’ve had for decades. Now, we saw the first case in 30 years in 2022 because someone selfishly thought that their personal beliefs were more important than the health and livelihood of everyone else.














  • I understand what you’re saying—I’m saying that data validation is precisely the purpose of parsers (or deserialization) in statically-typed languages. Type-checking is data validation, and parsing is the process of turning untyped, unvalidated data into typed, validated data. And, what’s more, is that you can often get this functionality for free without having to write any code other than your type (if the validation is simple enough, anyway). Pydantic exists to solve a problem of Python’s own making and to reproduce what’s standard in statically-typed languages.

    In the case of config files, it’s even possible to do this at compile time, depending on the language. Or in other words, you can statically guarantee that a config file exists at a particular location and deserialize it/validate it into a native data structure all without ever running your actual program. At my day job, all of our app’s configuration lives in Dhall files which get imported and validated into our codebase as a compile-time step, meaning that misconfiguration is a compiler error.






  • You do not understand how these things actually work. I mean, fair enough, most people don’t. But it’s a bit foolhardy to propose changes to how something works without understanding how it works now.

    There is no “database”. That’s a fundamental misunderstanding of the technology. It is entirely impossible to query a model to determine if something is “present” or not (the question doesn’t even make sense in that context).

    A model is, to greatly simplify things, a function (like in math) that will compute a response based on the input given. What this computation does is entirely opaque (including to the creators). It’s what we we call a “black box”. In order to create said function, we start from a completely random mapping of inputs to outputs (we’ll call them weights from now on) as well as training data, iteratively feed training data to this function and measure how close its output is to what we expect, adjusting the weights (which are just numbers) based on how close it is. This is a gross simplification of the complexity involved (and doesn’t even touch on the structure of the model’s network itself), but it should give you a good idea.

    It’s applied statistics: we’re effectively creating a probability distribution over natural language itself, where we predict the next word based on how frequently we’ve seen words in a particular arrangement. This is old technology (dates back to the 90s) that has hit the mainstream due to increases in computing power (training models is very computationally expensive) and massive increases in the size of dataset used in training.

    Source: senior software engineer with a computer science degree and multiple graduate-level courses on natural language processing and deep learning

    Btw, I have serious issues with both capitalism itself and machine learning as it is applied by corporations, so don’t take what I’m saying to mean that I’m in any way an apologist for them. But it’s important to direct our criticisms of the system as precisely as possible.


  • It’s got nothing to do with capitalism. It’s fundamentally a matter of people using it for things it’s not actually good at, because ultimately it’s just statistics. The words generated are based on a probability distribution derived from its (huge) training dataset. It has no understanding or knowledge. It’s mimicry.

    It’s why it’s incredibly stupid to try using it for the things people are trying to use it for, like as a source of information. It’s a model of language, yet people act like it has actual insight or understanding.