TODOs:

  • Support more engines
  • Support multi-instance/multi-node frameworks
  • Dashboard: Live dashboard for engines and gpus (Show real time performances (metrics) of different inference engines in dashboards with Prometheus + grafana)
  • Dashboard: Muti-select sub-runs and generate one command line to run + data visualization + report
  • Leaderboard: Muti-select sub-runs and generate one command line to run + data visualization + report
  • Make leaderboard results shareable through URL
  • Combine leaderboard and dashboard(Click on one record of the leaderboard, go to the sub-run detailed page)
  • Generate reports(pdf/markdown) of all/selected experiments
  • Support local model path (but also need to track the hugging face repo name for leaderboard)
  • Benchmark: different types should have different config according to different models (mainly restricted on the context length)