• Daemon Silverstein@thelemmy.club
    link
    fedilink
    arrow-up
    4
    ·
    2 months ago

    As for data science using Python, something tells me that this has to do with memory heap capacities. I’m not sure about Python’s max memory heap, but Javascript through Node.js seems to have only 512MB. I’ve been using Node.js to deal with big datasets and my most recent experimentation stumbled across the need of loading 100 million numbers to the RAM: while my PC has a fair amount of physical RAM (12GB) and a great part of it was available, it’ll simply error when filling an array. I needed an additional parameter, --max-old-space-size, so Node.js could deal with such amount of data. I didn’t try the same task with Python because I’m used to Javascript (yet I’m done some things in Python), but I wonder how much memory can Python hold until an error like “out of memory” happens, because ML models (for example, those hosted and served in HuggingFace) loads training weights with dozens of GBs

    • it_depends_man@lemmy.world
      link
      fedilink
      Deutsch
      arrow-up
      1
      ·
      edit-2
      1 month ago

      I wonder how much memory can Python hold until an error like “out of memory” happens, because ML models (for example, those hosted and served in HuggingFace) loads training weights with dozens of GBs

      All the stuff that’s LLM and the actual “serious” python libraries are implemented in C/C++ and only made accessible via python.

      Which doesn’t directly answer the question of what the maximum is, in those cases, but it should be obvious that C/C++ have some good ways to deal with memory.

      You can still do “traditional” memory management in python, or “memory aware programming” like, e.g. not trying to read a file in one piece, but reading and processing line by line.

      And using C from python is actually very easy and convenient with ctypes. https://docs.python.org/3/library/ctypes.html