Data Toolbar
Encyclopedia
Data Toolbar is an Internet Explorer
add-on
to collect catalog
style information from the web. The add-on converts structured web data into a table style format that can be loaded into a spreadsheet
or a database
.
tree, aiming to detect nested lists of data items matching the format of the specified content. This approach has several known advantages over a simple string matching algorithm.
Internet Explorer
Windows Internet Explorer is a series of graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems, starting in 1995. It was first released as part of the add-on package Plus! for Windows 95 that year...
add-on
Add-on
Add-on might mean:* Plug-in , a piece of software which enhances another software application and usually cannot be run independently** Browser extension, which modifies the interface and/or behavior of web browsers...
to collect catalog
Catalog
Catalog or catalogue may refer to:In science and technology:*Astronomical catalog, a catalog of astronomical objects**Star catalog, a catalog of stars*Pharmacopoeia, a book containing directions for the preparation of compound medicines...
style information from the web. The add-on converts structured web data into a table style format that can be loaded into a spreadsheet
Spreadsheet
A spreadsheet is a computer application that simulates a paper accounting worksheet. It displays multiple cells usually in a two-dimensional matrix or grid consisting of rows and columns. Each cell contains alphanumeric text, numeric values or formulas...
or a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
.
Algorithm
The program implements a variation of the genetic tree matching algorithm with respect to nested lists. That is, inside a given website, the program recursively traverses the branches of its DOMDocument Object Model
The Document Object Model is a cross-platform and language-independent convention for representing and interacting with objects in HTML, XHTML and XML documents. Aspects of the DOM may be addressed and manipulated within the syntax of the programming language in use...
tree, aiming to detect nested lists of data items matching the format of the specified content. This approach has several known advantages over a simple string matching algorithm.
Features
- Collection of data and images directly from the Internet Explorer
- Collection of information from Details pages linked to the catalog
- Automatic processing of multi-page catalogs
- Support of irregular multi-row catalogs mixed with advertisement
Similar Tools
- Automation AnywhereAutomation AnywhereAutomation Anywhere is a developer of automation software and testing software. The company was established in 2003, as Tethys Solutions, LLC in San Jose, California...
- The Web Extractor is a part of the larger automation system - Easy Web Extract - Standalone application, Windows
- Mozenda - Web based service
- Newprosoft - Standalone application, includes an Agent, Windows
- OutWit – Firefox Extension