|
Harvestman:technology
Havestman is written in a minimal style of C++, it uses only standard ANSI C libraries to ensure support for multiple platforms.
None of the C++ library or extended language features are used in an effort to maximize speed and compatability whilst minimising
build size and memory use. No STL, no typeid, no dynamic_cast or static_cast, no iostream.
The Harvestman architecture features an object-based hierarchy with its own RTTI (Run Time Type Identification) system giving it the ease of use and efficiency
of Java, whilst maintaining the speed of C++.
The core technology in Harvestman is its ability to categorise and recognise content on the world wide web.
The Harvestman engine has four major components;
-
In addition to its primary role as the foundation for the search technology used in Harvestman, the source code
library is suitable for many other applications. It is particularly suited to architecting Web Services and
other internet tools and technologies.
-
Web crawling component, responsible for navigating and fetching files and documents of all descriptions from the internet.
-
Makes descisions which determine the actions that the other sections of the technology act upon. Arbiter incorporates many machine learning and artificial intelligence principles.
-
A detailed scanning, pattern matching, and content recognition engine capable of recognising many forms of data.
Harvestman technology can be used for many tasks from augmenting search engines, web crawling, and deployment of web services.
With extenisve support for XML and related technologies Harvestman is well suited for many platforms where speed and memory limitations are important.
The Harvestman XML processor is written with modularity such that validation can be turned off and on, as can extended features (not in the specification)
like document repair.
|
|