transformers
torch
accelerate
bitsandbytes
llama-cpp-python --config-settings=cmake.args="-DGGML_CUDA=on"