Wilshire@lemmy.world to Technology@lemmy.worldEnglish · 4 months agoThe first GPT-4-class AI model anyone can download has arrived: Llama 405Barstechnica.comexternal-linkmessage-square61fedilinkarrow-up1215arrow-down117
arrow-up1198arrow-down1external-linkThe first GPT-4-class AI model anyone can download has arrived: Llama 405Barstechnica.comWilshire@lemmy.world to Technology@lemmy.worldEnglish · 4 months agomessage-square61fedilink
minus-squareraldone01@lemmy.worldlinkfedilinkEnglisharrow-up3arrow-down1·edit-24 months agoI regularly run llama3 70b unqantized on two P40s and CPU at like 7tokens/s. It’s usable but not very fast.
minus-squaresunzu@kbin.runlinkfedilinkarrow-up1·4 months agoso there is no way a 24gb and 64gb can run thing?
minus-squareraldone01@lemmy.worldlinkfedilinkEnglisharrow-up2·edit-24 months agoMy specs because you asked: CPU: Intel(R) Xeon(R) E5-2699 v3 (72) @ 3.60 GHz GPU 1: NVIDIA Tesla P40 [Discrete] GPU 2: NVIDIA Tesla P40 [Discrete] GPU 3: Matrox Electronics Systems Ltd. MGA G200EH Memory: 66.75 GiB / 251.75 GiB (27%) Swap: 75.50 MiB / 40.00 GiB (0%)
minus-squaresunzu@kbin.runlinkfedilinkarrow-up1·4 months agook this is a server. 48gb cards and 67gb ram? for model alone?
minus-squareraldone01@lemmy.worldlinkfedilinkEnglisharrow-up2·4 months agoEach card has 24GB so 48GB vram total. I use ollama it fills whatever vrams is available on both cards and runs the rest on the CPU cores.
minus-squareraldone01@lemmy.worldlinkfedilinkEnglisharrow-up1·4 months agoWhat are you asking exactly? What do you want to run? I assume you have a 24GB GPU and 64GB host RAM?
minus-squaresunzu@kbin.runlinkfedilinkarrow-up1·4 months agocorrect. and how ram speed work in this tbh
minus-squareraldone01@lemmy.worldlinkfedilinkEnglisharrow-up2·4 months agoMy memory sticks are all DDR4 with 32GB@2133MT/s.
I regularly run llama3 70b unqantized on two P40s and CPU at like 7tokens/s. It’s usable but not very fast.
so there is no way a 24gb and 64gb can run thing?
My specs because you asked:
CPU: Intel(R) Xeon(R) E5-2699 v3 (72) @ 3.60 GHz GPU 1: NVIDIA Tesla P40 [Discrete] GPU 2: NVIDIA Tesla P40 [Discrete] GPU 3: Matrox Electronics Systems Ltd. MGA G200EH Memory: 66.75 GiB / 251.75 GiB (27%) Swap: 75.50 MiB / 40.00 GiB (0%)
ok this is a server. 48gb cards and 67gb ram? for model alone?
Each card has 24GB so 48GB vram total. I use ollama it fills whatever vrams is available on both cards and runs the rest on the CPU cores.
What are you asking exactly?
What do you want to run? I assume you have a 24GB GPU and 64GB host RAM?
correct. and how ram speed work in this tbh
My memory sticks are all DDR4 with 32GB@2133MT/s.