WebSearch.Net is an open-source research platform that provides uniform data source access, data modeling, feature calculation, data mining, etc. It facilitates the experiments of web search researchers due to its high flexibility and extensibility. The platform can be used or extended by any language compatible for .Net 2 framework, from C# (recommended), VB.Net to C++ and Java. Thanks to the large coverage of knowledge in web search research, it is necessary to model the techniques and maintain them in a solution. WebSearch.Net platform will grow robust when more and more people are involved in this work.

I'm a student.  I am devoting my passion and spare time to developing the open-source project.  If you would like to excite me and motivate me, please donate :)

 

Overall Architecture

The current solution comprises DataCenter.Net (provides a uniform access to data source), Model.Net (provides modeling for the entities in web search), Feature.Net (provides the feature calculation), DataMiner.Net (a data mining library) and other utilities like Linguistics.Net (an adapter for SharpNLP, NICTCLAS and etc.), Common.Net (the common utilities for all module in the solutions). In the mean time, the WebSearch.Net references several open-source projects from SourceForge and CodePlex: Lucene.Net 1.9 is a well-known indexer; SharpNLP and NICTCLAS tokenizes and POS the English or Chinese characters. The system architecture may changes with time and more and more web search techniques will be added in future.

 

1. WebSearch.DataCenter.Net

The current supported data source types for research include query log, web collection, link collection, query collection and corpus. The supported store types include the file system (using Lucene.Net as indexer), data base (using Microsoft SQL server 2005 FTS as indexer) and internet (using search engines as indexer). The data source type and store type are extensible. Researchers may add their own data source type or store type by following several rules.

 

2. WebSearch.Feature.Net

 

3. WebSearch.DataMiner.Net

 

4. WebSearch.Model.Net

 

5. WebSearch.Linguistics.Net

The current WebSearch.Linguisitcis.Net project is actually an adapter or wrapper for other open-source linguistic projects like SharpNLP, NICTCLAS, etc now. It provides a uniform access to linguistic analysis, including tokenization, POS, Synset detection, gender estimation, and so on.

 

6. WebSearch.Crawler.Net

WebSearch.Crawler.Net is a utility to build a web crawler. It supports both the single-threaded and multi-threaded crawling. Researchers can either specify a list of URLs for it to crawl or give a seed site URL and the crawler will spider the network to retrieve the web page. The options include 'Allow Leave the Seed Site?', 'Crawling Depth?', 'Download Image?', 'Timer Interval?' and so on. You may even configure the network proxy when you local network is limited.

 

7. WebSearch.Maths.Net

WebSearch.Maths.Net provides a mathematics library where statistic utilities, probability utilities are included.

 

8. WebSearch.Common.Net

WebSearch.Common.Net provides the common utilities and setting throughout the solution.

Last edited Nov 8, 2012 at 6:04 PM by qinlanzhu, version 4