FlexGen

FlexGen is a high-throughput generation engine for running large language models with limited GPU memory (e.g., a 16GB T4 GPU or a 24GB RTX3090 gaming card!). This is a research project developed by HazyResearch@Stanford, SkyComputing@UC Berkeley, DS3Lab@ETH Zurich, CRFM@Stanford, and TogetherCompute.

More: https://github.com/FMInference/FlexGen

Tinggalkan Balasan Batalkan balasan