Danijel puts a focus on the power of open source to reduce climate impact of AI training and inference and presents some tools:
- vLLM - a high througput FOSS LLM inference engine
- Provides Paged Attention, Advnaced KV-Cache, Quantization and much more out of the box
- Can balance performance with size and cost reductions
- LLM Compressor
- More granular control of quantization and pruning via per-tensor etc. detail levels
- Out of the box “best practice“ defaults for quick saving results
- llm-d
- Complete routing architecture for distributed models - FOSS
- For instance serve smaller models for mathematical requests and bigger models for essay-like requests - Picking smallest and most apt model for the job
If you want to dive deeper all slides are availble on the ecoCompute website: https://www.eco-compute.io/files/slides_2025/01_Thursday/03_Community/04_Soldo_Paint_It_Green.pdf