As of August 4th, 2024, the BIRD team will stop using of the Valid Efficiency Score (VES) as the efficiency metric for submission evaluation. The VES metric does not have a upper boundary on time ratio,which can result in misleading evaluations, especially when most predicted SQL queries are inherently more time-consuming but a few are extreme faster than ground truth. You can review the VES results for previously submitted models below.

Date Model Code Size Oracle Knowledge Dev Test
Human Performance Data Engineers + DB Students ✔️ 90.27
May 14, 2024 ExSL + granite-20b-code IBM Research AI 20B ✔️ 75.75 80.40
Jul 22, 2024 Distillery + GPT-4o Distyl AI Research UNK ✔️ 72.94 77.74
Jul 14, 2024 RECAP + Gemini Google Cloud UNK ✔️ 76.11
Jul 2, 2024 ByteBrain ByteDance Infra Lab 33B ✔️ 65.80 73.24
May 24, 2024 ExSL + granite-20b-code IBM Research AI 20B 66.34 72.78
May 21, 2024 CHESS link Talaei et al.’24 UNK ✔️ 65.43 72.63
Jan 14, 2024 MCS-SQL + GPT-4 Dunamu UNK ✔️ 64.82 71.35
Apr 10, 2024 GRA-SQL Tencent CDP-youpu UNK ✔️ 67.55 69.56
Feb 27, 2024 PB-SQL Seoul National University UNK ✔️ 71.31 68.90
Jul 5, 2024 Insights AI Uber Freight UNK ✔️ 68.82
Apr 08, 2024 OpenSearch-SQL,v1 + GPT-4 Alibaba Cloud UNK ✔️ 68.38 68.80
Nov 21, 2023 MAC-SQL + GPT-4 Wang et al. ’23 BUAA & Tencent UNK ✔️ 58.76 67.68
Jun 1, 2024 SuperSQL link Li et al. ’24 UNK ✔️ 61.99 67.66
Jun 7, 2024 SFT CodeS-15B + SQLFixAgent Soochow University UNK ✔️ 67.24
Feb 27, 2024 DTS-SQL + DeepSeek 7B link Pourreza et al. ’24 7B ✔️ 60.31 64.52
Oct 12, 2023 SFT CodeS-15B link Li et al. SIGMOD’24 15B ✔️ 59.87 64.22
Mar 27, 2024 {Chat2Query} (GPT-4 + data entity modeling) (PingCAP) link PingCAP UNK ✔️ 63.89
Oct 12, 2023 SFT CodeS-7B link Li et al. SIGMOD’24 7B ✔️ 58.80 63.62
Nov 16, 2023 Dubo-SQL, v1 Mercator Technologies UNK ✔️ 66.01 63.00
Nov 9, 2023 DAIL-SQL + GPT-4 link Gao and Wang et al. VLDB’24 UNK ✔️ 56.08 61.95
Jul 1, 2023 GPT-4 link Baseline UNK ✔️ 49.77 60.77
Aug 15, 2023 DIN-SQL + GPT-4 link Pourreza et al. ’23 UNK ✔️ 58.79 59.44
Mar 17, 2023 ChatGPT + CoT link Li et al. NeurIPS’23 UNK ✔️ 42.30 56.56
Mar 17, 2023 ChatGPT Baseline UNK ✔️ 43.81 51.40
Mar 17, 2023 ChatGPT + CoT link Li et al. NeurIPS’23 UNK 32.33 49.69
Nov 23, 2023 OPEN-SQL Anonymous 7B ✔️ 41.56 48.08
Feb 17, 2023 Codex Baseline 175B ✔️ 43.41 41.60
Mar 17, 2023 ChatGPT Baseline UNK 27.97 36.68
Feb 17, 2023 Codex Baseline 175B 33.37 35.40
Feb 5, 2023 T5-3B Baseline 3B ✔️ 25.57 27.80
Feb 3, 2023 T5-Large Baseline 770M ✔️ 22.74 25.00
Feb 5, 2023 T5-3B Baseline 3B 13.62 15.17
Feb 3, 2023 T5-Base Baseline 220M ✔️ 12.90 14.70
Feb 3, 2023 T5-Large Baseline 770M 9.90 12.25
Feb 3, 2023 T5-Base Baseline 220M 7.78 8.97