8-element Vector{String}:
"sentence_id"
"doc_id"
"date"
"event_type"
"label"
"sentence"
"score"
"speaker"
JuliaCon 2024
Thursday, July 11, 2024
A light-weight package providing Julia users easy access to the Trillion Dollar Words dataset and model (Shah, Paturi, and Chava 2023).
Disclaimer
Please note that I am not the author of the Trillion Dollar Words paper nor am I affiliated with the authors. The package was developed as a by-product of our research and is not officially endorsed by the authors of the paper.
ICML 2024 paper Stop Making Unscientific AGI Performance Claims (Altmeyer et al. 2024) (preprint, blog post, code):
The package provides the following functionality:
40,000 time-stamped sentences from
by members of the Federal Open Market Committee (FOMC):
Transformers.HuggingFace.HGFConfig
can also be passed.Layer-wise activations can be computed as follows:
We have archived activations for each layer and sentence as artifacts:
OK, but why would I need all this? 🤔
“They’re exactly the same.”
— Linear probe \(\widehat{cpi}=f(A)\)
If probe results were indicative of some intrinsic ‘understanding’, probe should not be sensitive to unrelated sentences.
Good starting point for the following ideas:
Any contributions are very much welcome.
With thanks to my co-authors Andrew M. Demetriou, Antony Bartlett, and Cynthia C. S. Liem and to the audience for their attention.