WebQL
Encyclopedia
WebQL is a software platform produced by QL2 Software, that is used to automate data integration
and collection from unstructured and structured sources including the Web, PDF and Word documents, spreadsheets, email repositories, corporate data stores and more.
WebQL has been on the market since 2001. The most current version, WebQL 3.1, was released in November 2006. WebQL was named a "Trend Setting Product for 2006" by KM World. WebQL customers include 5 of the top 10 pharmaceutical companies and 7 of the top 10 airlines.
In addition to handling content in text, WebQL is also capable of Optical Character Recognition that enables it to retrieve text within images.
In many web data integration tasks, the desired data is located on a web page that is accessible only through a form that needs to be completed to access detailed information. WebQL is capable of automatically populating such variable data to gain access to the “deep” Web. The data can then be extracted by WebQL and transformed into an actionable format to be used in a variety of analytical operations.
WebQL features novel URL schemes that allow for enhanced flexibility when accessing data sources that are external to WebQL. WebQL also support XML data of arbitrary size, and APIs for embedding WebQL in C
, Java
or .NET
programs.
WebQL is driven by a sophisticated programming language similar to standard SQL
. The language has a number of operations designed to simplify complex data integration tasks. By providing a virtual database layer, WebQL shields developers from the complexity of specific data formats and network protocols. WebQL programmers can use their existing SQL skills to access, transform and integrate data with minimal effort. WebQL can also be operated by less technical users through WebQL Desktop. In addition to licensing the WebQL software for deployment in a customer’s environment, QL2 Software has solutions and will develop customer solutions built using WebQL technology on behalf of its customers, and host them in the company’s secure online data center.
Below are several sample WebQL scripts. While scripts to perform real-world data integration tasks are generally much larger, these scripts give a sense the language’s capabilities.
The following script examines every document within two links of the QL2 Software home page and retrieves every phrase of the form “the X”:
select item1
from pattern '(the \w+)'
within crawl of http://www.ql2.com/
to depth 2
The following script searches blogs for discussions about Wikipedia:
select
URL,
clean(CONTENT) as TITLE
from
links
within
http://blogsearch.google.com
submitting values 'wikipedia' for 'q'
where
url_host(URL) not matching 'google'
The following script generates three-sentence summaries of current news stories:
select
source_content as DOCUMENT,
source_title as TITLE,
source_url as URL
from
crawl of http://news.google.com
to depth 2
following if url_host(URL) not matching 'google'
join where URL not matching 'google'
select
URL,
TITLE,
summarize(clean(ARTICLE_BODY), 3) as SUMMARY
from
articles
within
inline DOCUMENT
Data integration
Data integration involves combining data residing in different sources and providing users with a unified view of these data.This process becomes significant in a variety of situations, which include both commercial and scientific domains...
and collection from unstructured and structured sources including the Web, PDF and Word documents, spreadsheets, email repositories, corporate data stores and more.
WebQL has been on the market since 2001. The most current version, WebQL 3.1, was released in November 2006. WebQL was named a "Trend Setting Product for 2006" by KM World. WebQL customers include 5 of the top 10 pharmaceutical companies and 7 of the top 10 airlines.
In addition to handling content in text, WebQL is also capable of Optical Character Recognition that enables it to retrieve text within images.
In many web data integration tasks, the desired data is located on a web page that is accessible only through a form that needs to be completed to access detailed information. WebQL is capable of automatically populating such variable data to gain access to the “deep” Web. The data can then be extracted by WebQL and transformed into an actionable format to be used in a variety of analytical operations.
WebQL features novel URL schemes that allow for enhanced flexibility when accessing data sources that are external to WebQL. WebQL also support XML data of arbitrary size, and APIs for embedding WebQL in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
or .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
programs.
WebQL is driven by a sophisticated programming language similar to standard SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
. The language has a number of operations designed to simplify complex data integration tasks. By providing a virtual database layer, WebQL shields developers from the complexity of specific data formats and network protocols. WebQL programmers can use their existing SQL skills to access, transform and integrate data with minimal effort. WebQL can also be operated by less technical users through WebQL Desktop. In addition to licensing the WebQL software for deployment in a customer’s environment, QL2 Software has solutions and will develop customer solutions built using WebQL technology on behalf of its customers, and host them in the company’s secure online data center.
Below are several sample WebQL scripts. While scripts to perform real-world data integration tasks are generally much larger, these scripts give a sense the language’s capabilities.
The following script examines every document within two links of the QL2 Software home page and retrieves every phrase of the form “the X”:
select item1
from pattern '(the \w+)'
within crawl of http://www.ql2.com/
to depth 2
The following script searches blogs for discussions about Wikipedia:
select
URL,
clean(CONTENT) as TITLE
from
links
within
http://blogsearch.google.com
submitting values 'wikipedia' for 'q'
where
url_host(URL) not matching 'google'
The following script generates three-sentence summaries of current news stories:
select
source_content as DOCUMENT,
source_title as TITLE,
source_url as URL
from
crawl of http://news.google.com
to depth 2
following if url_host(URL) not matching 'google'
join where URL not matching 'google'
select
URL,
TITLE,
summarize(clean(ARTICLE_BODY), 3) as SUMMARY
from
articles
within
inline DOCUMENT