
Over the last year, we made much progress towards composable systems for working with machine learning’s core asset: data.
https://data.mlr.press/assets/pdf/v01-5.pdf
Croissant, a new format for ML ready datasets, has seen the day of light with integrations across Huggingface, Kaggle, OpenML and with loaders for TensorFlow, PyTorch, Jax and Keras, with more than >400k datasets indexed and accessible to date.
– https://github.com/mlcommons/croissant
– https://arxiv.org/abs/2403.19546
A burgeoning DMLR (Data-Centric Machine Learning Research) ecosystem has grown, complete with back-to-back workshops at NeurIPS, ICML and ICLR and an ambitious journal
In this workshop, we are taking first steps to cross-pollinate ecosystems, matching advanced data-centric methods and infrastructure with high-impact public good expertise.
On Friday, May 31, we will meet in Geneva to deliberate over a roadmap for one such domain, namely healthcare.
The goals of our meeting are threefold:
1) Take inventory of the assets we collectively work on in this space
2) Align on our vision
3) Define a roadmap for the adoption of Croissant Health Extension (or similar format)
This will be a no-nonsense-get-things-done workshop meeting. We will meet from 9.30AM to 1.30PM keeping a tight beat with a few coffee breaks inbetween. The schedule is split into two main components.
- A) Round robin inventory presentations: every speaker will give a short 15min summary of their corner of the universe
- B) Roadmap discussion: we will define targets for format adoption and dissemination + any serendipitous ideas that come during the discussion