BIRD VES Leaderboard [Legacy]

As of August 4th, 2024, the BIRD team will stop using of the Valid Efficiency Score (VES) as the efficiency metric for submission evaluation. The VES metric does not have a upper boundary on time ratio,which can result in misleading evaluations, especially when most predicted SQL queries are inherently more time-consuming but a few are extreme faster than ground truth. You can review the VES results for previously submitted models below.

Date	Model	Code	Size	Oracle Knowledge	Dev	Test
Human Performance	Data Engineers + DB Students			✔️		90.27
May 14, 2024	ExSL + granite-20b-code	IBM Research AI	20B	✔️	75.75	80.40
Jul 22, 2024	Distillery + GPT-4o	Distyl AI Research	UNK	✔️	72.94	77.74
Jul 14, 2024	RECAP + Gemini	Google Cloud	UNK	✔️	–	76.11
Jul 2, 2024	ByteBrain	ByteDance Infra Lab	33B	✔️	65.80	73.24
May 24, 2024	ExSL + granite-20b-code	IBM Research AI	20B		66.34	72.78
May 21, 2024	CHESS	link Talaei et al.’24	UNK	✔️	65.43	72.63
Jan 14, 2024	MCS-SQL + GPT-4	Dunamu	UNK	✔️	64.82	71.35
Apr 10, 2024	GRA-SQL	Tencent CDP-youpu	UNK	✔️	67.55	69.56
Feb 27, 2024	PB-SQL	Seoul National University	UNK	✔️	71.31	68.90
Jul 5, 2024	Insights AI	Uber Freight	UNK	✔️	–	68.82
Apr 08, 2024	OpenSearch-SQL,v1 + GPT-4	Alibaba Cloud	UNK	✔️	68.38	68.80
Nov 21, 2023	MAC-SQL + GPT-4	Wang et al. ’23 BUAA & Tencent	UNK	✔️	58.76	67.68
Jun 1, 2024	SuperSQL	link Li et al. ’24	UNK	✔️	61.99	67.66
Jun 7, 2024	SFT CodeS-15B + SQLFixAgent	Soochow University	UNK	✔️	–	67.24
Feb 27, 2024	DTS-SQL + DeepSeek 7B	link Pourreza et al. ’24	7B	✔️	60.31	64.52
Oct 12, 2023	SFT CodeS-15B	link Li et al. SIGMOD’24	15B	✔️	59.87	64.22
Mar 27, 2024	{Chat2Query} (GPT-4 + data entity modeling) (PingCAP)	link PingCAP	UNK	✔️	–	63.89
Oct 12, 2023	SFT CodeS-7B	link Li et al. SIGMOD’24	7B	✔️	58.80	63.62
Nov 16, 2023	Dubo-SQL, v1	Mercator Technologies	UNK	✔️	66.01	63.00
Nov 9, 2023	DAIL-SQL + GPT-4	link Gao and Wang et al. VLDB’24	UNK	✔️	56.08	61.95
Jul 1, 2023	GPT-4	link Baseline	UNK	✔️	49.77	60.77
Aug 15, 2023	DIN-SQL + GPT-4	link Pourreza et al. ’23	UNK	✔️	58.79	59.44
Mar 17, 2023	ChatGPT + CoT	link Li et al. NeurIPS’23	UNK	✔️	42.30	56.56
Mar 17, 2023	ChatGPT	Baseline	UNK	✔️	43.81	51.40
Mar 17, 2023	ChatGPT + CoT	link Li et al. NeurIPS’23	UNK		32.33	49.69
Nov 23, 2023	OPEN-SQL	Anonymous	7B	✔️	41.56	48.08
Feb 17, 2023	Codex	Baseline	175B	✔️	43.41	41.60
Mar 17, 2023	ChatGPT	Baseline	UNK		27.97	36.68
Feb 17, 2023	Codex	Baseline	175B		33.37	35.40
Feb 5, 2023	T5-3B	Baseline	3B	✔️	25.57	27.80
Feb 3, 2023	T5-Large	Baseline	770M	✔️	22.74	25.00
Feb 5, 2023	T5-3B	Baseline	3B		13.62	15.17
Feb 3, 2023	T5-Base	Baseline	220M	✔️	12.90	14.70
Feb 3, 2023	T5-Large	Baseline	770M		9.90	12.25
Feb 3, 2023	T5-Base	Baseline	220M		7.78	8.97