Now you may speak SQL Freely!

09/21/2020

SFDC launches “Photon” - Natural Language Interface for Databases

Natural Language Processing (NLP) has evolved to a tool that we all use in various forms everyday now. The “Siri” or a “Cortana” or a “Hey Google” have long become a staple of our daily life where we interact with these virtual assistants to seek answers. Similarly, now researchers and scientists at SFDC and Chinese University of Hong Kong are hoping that the NLP can be extended to developing programming languages as well.

Salesforce Research Scientists and Chinese University of Hong Kong have released “Photon”, a natural language interface to databases (NLIDB). The research team used deep-learning to construct a parser that achieves 63% accuracy on a common benchmark and an error-detecting module that prompts users to clarify ambiguous questions.

At the recent ACL 2020 conference, the team successfully demonstrated the key features behind Photon which is a neural-network-based semantic parser which converts natural language questions from a human user into SQL queries achieving about 63.2% exact-match accuracy on the Spider dataset, which is the second highest result achieved to date.

The goal of a NLIDB is to "democratize" the ability to extract useful data from relational databases, allowing users to ask questions in natural language instead of requiring the construction of a query in a programming language such as SQL. Like many of these systems, Photon uses a strategy called semantic parsing which converts the natural-language question into a logical form essentially translating human language into programming language statements. Photon's parser is based on a neural-network whose input is a natural-language question concatenated with the database schema, and whose output is an SQL query.

The parser does not have access to the complete content of the database, but for categorical columns it does have access to the possible values. The parser consists of a pre-trained BERT model and a series of LSTM sub-networks. Photon then performs beam-search decoding of the network output and applies a static SQL correctness check on the results and provides an improvement of approximately 5% on the Spider dataset.

To improve the robustness of the system, Photon includes a "human-in-the-loop" question corrector. The corrector uses another neural network, a classifier that determines if a question cannot be accurately translated to SQL. The classifier is trained on a synthetic dataset constructed by the researchers by applying "swap" and "drop" operations on translatable questions. The confusion detector also identifies particular portions of questions (spans) that are confusing. These spans are used to suggest corrections, which are fed back to the user via a chat interface.

Photon also incorporates a question corrector which can detect when the human input cannot be translated into SQL; the question corrector initiates a dialog with the user to further refine the question, using a "chat-bot" style interface. Expert users can also input queries directly as SQL.

A demo version of Photon is available to the public. Lin says that future work includes "voice input, auto-completion, and visualization of the output," but no dates for these features have been announced.

Idea Helix is a Salesforce Certified Silver Partner. Reach out to us for more details by following us on (Facebook) or send us a note from ideahelix.com or at info@ideahelix.com