আমাদের কথা খুঁজে নিন

   

Knowlesys Web Data Mining System



The web is an ocean of information containing more than 10 billion web pages, wherein 90% of them are in non-structured or semi-structured formats. At present, it is expanding with an increasing rate of 1 million pages per day. The information is increasing at an explosive speed while people’s time and energy are limited. The information absolutely valuable for enterprises or individuals is just lying in this worldwide ocean of the Internet, and how to extract them has become one of the most imperative tasks confronting the research institutions that are engaging the important topics of Information Retrieval, Data Mining, Knowledge Management and Competitive Intelligence etc. The Knowlesys Web Data Mining System(KWDMS) is like a huge blue whale who cruises in this information ocean everyday and is capable of automatically and accurately extracting valuable data for you from the webpage ocean wherein a multitudes of useless text (such as page headers and footers, column listings and advertisement messages) shall be excluded. In more than five year’s time, the Knowlesys Software, Inc. had developed the KWDMS– a powerful web information extraction system. It has a stratified structure and a loosely coupled module design comprising many sub-systems. The KWDMS can extract designated information in big volume from the web, and integrate them into specified relational databases, thus to help customers to excavate precious stones from the Internet minefield. Since the process converses the information from the semi-structural form into the structural form, from their dispersed state to the concentrated state, and changes them from the remotely existed information to your locally hoarded treasure, as well as from the visual file into the digital record, you can surely extensively use them in the future. The KWDMS is capable of doing data extraction from various types of websites. In addition to extracting field data of semi-structured construction, it can also extract some free text information like e-mail addresses and many types of multimedia files. The KWDMS is characterized as a stable running, intelligent crawling and accurate extracting software. The KWDMS is an information extraction platform. When new extraction task is required, it is necessary to use this platform to configure the new web crawling and extraction script and parameters. A general database access layer is developed in the KWDMS that enables its back end connect to any relational database, such as MS SQL Server, Oracle, DB2, Sybase, MySQL and InterBase etc, even those file database like the Access database. Regardless which type the database is, the extracted data can be checked with a general database browser, as well as export them into various formats such as XML, CVS, HTML, Excel and so on. Where it is used Acquiring Key Information: Obtain all kinds of data from online databases. For example, resumes, job listings, business directories etc. Competitive Information System: Monitor through keywords the marketing information of your adversaries who compete with you on the Internet media. Enterprise Content Management: Accurately acquire outside content in batches and process them automatically. Database Marketing: Extract comment and contact messages of potential customers from message board, forums and newsgroups. Comparison System: Extract products prices from multiple websites and compare the price, details in one-store for your happy users. Enterprise Integration Portal: Embed real-time contents from external websites into your EIP interface. Integration of Internet information: Put together the information extracted from the same category websites such as personal resume, employment message, lease and rent message, commodity message and company directory etc. Web Information Agent: Integration of up-to-date information from various websites in which individuals or enterprises might be interested, and provide them to users through E-mail or just put them on your webpages, thus to save the time iof browsing and downloading.

অনলাইনে ছড়িয়ে ছিটিয়ে থাকা কথা গুলোকেই সহজে জানবার সুবিধার জন্য একত্রিত করে আমাদের কথা । এখানে সংগৃহিত কথা গুলোর সত্ব (copyright) সম্পূর্ণভাবে সোর্স সাইটের লেখকের এবং আমাদের কথাতে প্রতিটা কথাতেই সোর্স সাইটের রেফারেন্স লিংক উধৃত আছে ।

প্রাসঙ্গিক আরো কথা
Related contents feature is in beta version.