News
Microsoft is open-sourcing an optimized version of Google's BERT that uses ONNX Runtime and CPUs or GPUs to speed language model performance.
The company's immensely powerful DGX SuperPOD trains BERT-Large in a record-breaking 53 minutes and trains GPT-2 8B, the world's largest transformer-based network, with 8.3 billion parameters.
Results that may be inaccessible to you are currently showing.
Hide inaccessible results