Discussion about this post

User's avatar
Mike Caulfield's avatar

Of all these, I do believe that learning and developing tools and methods to evaluate things like relevant retrieval and appropriate/useful summary are crucial. One of the reasons that I focus so much on contextualization and verification is it is very amenable to the development of model responses/response rubrics that can be used to compare with output and better understand the weaknesses and strengths of various systems, and even more importantly, the specific conditions under which they fail.

I am not even sure this has to be done at the highest level of formality (unless looking to publish). But I find it odd when people in this space do *not* have at least a half dozen ready to go challenges to test the behavior of a given system along multiple dimensions.

Expand full comment
Frank Norman's avatar

That’s a strong message and call to action, or call for learning. It sounds good.

I wonder, are there different evaluation methods for tools as used by librarians, vs tools as used by end users? It seems to me that these are two quite different situations.

Expand full comment
2 more comments...

No posts

Ready for more?