Interesting AI Ex...ghts

July 23, DSPy with GPT-4o-mini on MMLU-Pro

DSPy is an optimization framework that enhances prompts and responses from models like GPT-4o-mini. It showcases the magic of the framework and demonstrates how to use its powerful optimizers to improve the cost-effective model. The MMLU-Pro dataset is an advanced dataset with complex questions and increased answer choices. The evaluation metric is defined to check if the model's responses match the true answers.

July 23, DSPy with GPT-4o-mini on MMLU-Pro

July 16, 2024 LLMs Evals Thoughts

Evaluating LLMs is important for understanding their abilities and solving real business problems. A good evaluation requires sufficient and high-quality data samples, clear judging criteria, meaningful evaluation tasks, and frequent private benchmarks. The process should adapt to the development of LLMs over time.

July 16, 2024 LLMs Evals Thoughts

July 14, 2024 How to use Yi-Vision with TextGrad

TextGrad is an autograd engine that enhances language models through iterative feedback. It has recently expanded to support multimodal optimization. This guide explains how to adapt TextGrad for use with other models using Yi-Vision. The steps involve making tweaks to a script, adding the model name to a list, and utilizing ChatExternalClient with the API key. The example code demonstrates importing an image and using TextGrad for answering a question about the image. It also includes a loss function for evaluating the answer and an optimizer for improving the answer.

July 14, 2024 How to use Yi-Vision with TextGrad

July 9, How to use DeepSeek with TextGrad

TextGrad can be used with models like DeepSeek not just OpenAI, allowing for optimization.

July 9, How to use DeepSeek with TextGrad

July, 2024 LLMs Evaluation Benchmarks

Evaluation benchmarks for Large Language Models (LLMs) are being updated to match their evolving capabilities. This blog explores several commonly referenced evaluation datasets that assess different aspects of LLMs, including math and reasoning, truthfulness, code comprehension, instruction following, and more.

July, 2024 LLMs Evaluation Benchmarks