Reinforcement - Search News

DeepSeek-R1’s bold bet on reinforcement learning: How it outpaced OpenAI at 3% of the cost

DeepSeek-R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to ...

18h

DeepSeek R1 Replicated for $30 By Researchers at UC Berkeley

UC Berkeley replicates DeepSeek R1 for $30, proving advanced AI can be affordable. Discover how this breakthrough is ...

devdiscourse1d

The silent saboteur: Action-level backdoor attacks in deep reinforcement learning

To counter the sophisticated threats posed by advanced backdoor frameworks like UNIDOOR, the study underscores the importance of implementing proactive and robust security measures for DRL systems.

unite5d

DeepSeek-R1: Transforming AI Reasoning with Reinforcement Learning

DeepSeek-R1 is the groundbreaking reasoning model introduced by China-based DeepSeek AI Lab. This model sets a new benchmark ...

Positive reinforcement training leads to better results, happier dogs

Webster, the word “aversive” means “tending to avoid or causing avoidance of a noxious or punishing stimulus.” Does that sound like a training method you’d want to use on ...

Barcelona to prioritize reinforcement in crucial position next summer – report

Barcelona’s plans for next season are shaping up as the club shifts focus towards strengthening its squad. While the search ...

news.crunchbase8d

Reinforcement Learning From Human Feedback Took Travel AI Tool To Near-Perfect Accuracy

Improving AI performance through reinforcement learning from human feedback added a travel assistant feature to travel ...

1don MSN

Metro pier reinforcement collapses in Mumbai's Chembur society

A reinforcement cage collapsed during construction on Metro Line 4 in Suman Nagar, Chembur, but no injuries were reported.

12d

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

The company developed DeepSeek-R1 by using pure reinforcement learning on top of DeepSeek-V3-Base, and matched or beat o1 on some benchmarks.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results