Cohere for AI Enhances Large Language Models LLMs with Active Inheritance: Steering Synthetic Data Generation for Optimal Performance and Reduced Bias (2024)

0 Shares

Synthetic data generation is gaining prominence in the field of machine learning. This technique creates vast datasets when real-world data is limited and expensive. Researchers can train machine learning models more effectively by generating synthetic data, enhancing their performance across various applications. The generated data is crafted to exhibit specific characteristics beneficial for the models’ learning process.

However, integrating synthetic data into machine learning models presents several challenges, particularly regarding the biases and attributes the synthetic data may introduce. Understanding how these inherited characteristics impact the behavior and performance of large language models (LLMs) is crucial. The primary concern is whether the synthetic data can introduce unintended biases or other attributes that might affect the model’s outputs. This understanding is vital for ensuring that models trained with synthetic data are effective and fair, avoiding perpetuating negative traits from the data generation process.

Current methods for optimizing the data space involve data augmentation, pseudo-labeling, data weighting, data pruning, and curriculum learning. Data augmentation expands datasets by creating modified versions of existing data. Pseudo-labeling involves generating labels for unlabeled data, effectively expanding the dataset. Data weighting assigns different importance to various data points, and data pruning removes less useful data, enhancing the quality of the remaining dataset. Curriculum learning structures the training process by gradually introducing more complex data. Despite their utility, these methods are limited by the properties inherent in the initial datasets. They often need to be able to introduce new, desirable attributes, restricting their effectiveness in optimizing models for specific characteristics.

Researchers from Cohere for AI and Cohere have proposed a novel concept called “active inheritance.” This method aims to intentionally steer synthetic data generation towards specific non-differentiable objectives, such as high lexical diversity and low toxicity. By guiding the data generation process, researchers can directly influence the characteristics of the resulting models. Active inheritance involves selecting proxy labels based on desired characteristics, generating multiple samples for each prompt, and choosing the sample that maximizes the desired attribute. This approach, known as targeted sampling, allows for fine-tuning models towards specific goals using synthetic datasets curated to enhance these attributes.

Cohere for AI Enhances Large Language Models LLMs with Active Inheritance: Steering Synthetic Data Generation for Optimal Performance and Reduced Bias (1)

The active inheritance method has shown significant promise. For instance, targeted sampling effectively steers model behavior towards desirable attributes, resulting in substantial improvements. Models demonstrated up to 116% improvement in length and 43% enhancement in linguistic diversity. Moreover, the method reduced toxicity by up to 40%. These results highlight the potential of active inheritance to enhance the quality and safety of language models. By focusing on specific characteristics, researchers can ensure that the models exhibit desirable traits while minimizing negative ones.

Cohere for AI Enhances Large Language Models LLMs with Active Inheritance: Steering Synthetic Data Generation for Optimal Performance and Reduced Bias (2)

The study also examined how passive inheritance, where models inherit properties from the synthetic data without explicit guidance, impacts model performance. The research highlighted that models are sensitive to the properties of the artificial data they are trained on, even when the data prompts appear neutral. This sensitivity raises concerns about the potential for introducing unintended biases and attributes into the models. The findings underscore the importance of carefully curating synthetic data to avoid undesirable outcomes.

Cohere for AI Enhances Large Language Models LLMs with Active Inheritance: Steering Synthetic Data Generation for Optimal Performance and Reduced Bias (3)

In conclusion, the research underscores the significant impact of synthetic data on the attributes of large language models. By introducing the concept of active inheritance, researchers from Cohere have provided a robust framework for steering synthetic data generation towards desirable characteristics. This method enhances specific attributes, such as lexical diversity and reduced toxicity, ensuring that models trained with synthetic data are effective and safe. The study’s results demonstrate that it is possible to successfully and efficiently instill desired attributes into a model’s generation with minimal effort. Active inheritance represents a promising approach to optimizing machine learning models, offering a pathway to more sophisticated and reliable AI systems.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,don’t forget to follow us onTwitter.

Join ourTelegram Channel andLinkedIn Group.

If you like our work, you will love ournewsletter..

Don’t Forget to join our46k+ ML SubReddit

Asif Razzaq

|Website

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

0 Shares

Cohere for AI Enhances Large Language Models LLMs with Active Inheritance: Steering Synthetic Data Generation for Optimal Performance and Reduced Bias (2024)
Top Articles
The Rise of Filmy4wap Download: A Comprehensive Guide - The Digital Weekly
Explore The Chicago Approach
St Thomas Usvi Craigslist
Why Are Fuel Leaks A Problem Aceable
CLI Book 3: Cisco Secure Firewall ASA VPN CLI Configuration Guide, 9.22 - General VPN Parameters [Cisco Secure Firewall ASA]
Couchtuner The Office
The 10 Best Restaurants In Freiburg Germany
Explore Tarot: Your Ultimate Tarot Cheat Sheet for Beginners
Crocodile Tears - Quest
Georgia Vehicle Registration Fees Calculator
27 Places With The Absolute Best Pizza In NYC
Locate Td Bank Near Me
Zendaya Boob Job
Osrs Blessed Axe
Oriellys St James Mn
Brutál jó vegán torta! – Kókusz-málna-csoki trió
Things To Do In Atlanta Tomorrow Night
Beau John Maloney Houston Tx
Rhinotimes
Best Suv In 2010
Mbta Commuter Rail Lowell Line Schedule
N2O4 Lewis Structure & Characteristics (13 Complete Facts)
Khiara Keating: Manchester City and England goalkeeper convinced WSL silverware is on the horizon
The Exorcist: Believer (2023) Showtimes
Trivago Sf
Is The Yankees Game Postponed Tonight
Hewn New Bedford
How Long After Dayquil Can I Take Benadryl
Sam's Club Gas Price Hilliard
4Oxfun
Firefly Festival Logan Iowa
Jesus Calling Feb 13
Skepticalpickle Leak
How Do Netspend Cards Work?
Best New England Boarding Schools
Ultra Clear Epoxy Instructions
Tyler Sis 360 Boonville Mo
Barrage Enhancement Lost Ark
Craigslist West Seneca
42 Manufacturing jobs in Grayling
Foolproof Module 6 Test Answers
Oriellys Tooele
Adam Bartley Net Worth
Owa Hilton Email
[Teen Titans] Starfire In Heat - Chapter 1 - Umbrelloid - Teen Titans
Nu Carnival Scenes
Unlock The Secrets Of "Skip The Game" Greensboro North Carolina
Babykeilani
Nope 123Movies Full
What Does the Death Card Mean in Tarot?
Ics 400 Test Answers 2022
Best brow shaping and sculpting specialists near me in Toronto | Fresha
Latest Posts
Article information

Author: Kieth Sipes

Last Updated:

Views: 6278

Rating: 4.7 / 5 (47 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.