Syntactic parser of English sentences
- simple English parser library



This is the web page of my project Syntactic parser of English sentences. For exploring this project it is best to read an abstract written below and downloads section, where you can find the electronic version (PDF) of the written part. In final stage there will be implemented speech recognition. I will probably use Microsft™ Speech SDK.
This work is about proof-of-concept of syntactic parser and formation the rules of reverse grammar of English. The result is a simple test program application, which parses certain parts of English sentences.
Rules for parsing were based on literature and proper analysis of implementation problems.
Rules in this work can be used to implement syntactic parser into several programming languages. They are necessary extension over user interface for programming semantics (interpretation) analyzer. This project is still in a stage of development and we continually working for its improvement.
Syntactic parser described here has a broad field of application. It can be applied in several ways e.g. controlling machines and devices by voice where the machine is able to recognize the meaning of the sentence. It can be used for automatic analysis of electronics mail (antispam) and data mining


There are many things to do. First and the most important thing I am working on is improving parser rules capability to parse all English sentences. I am also working on universal C# library.
In present time I'm working on three main threads:

Description of this plans is in a documentation section.


.downloads. September 8, 2006



Basic idea how this parser and generator works is here

O<--Written input sentence
O<->Rule based lexical analysis + Morphological Analysis & Dictionary
O<->Syntactic analysis
O-->Tree structure

The written sentence is transformed and every part of speech is recognized. It is a very complicated process with many problems. Then syntactic parser generates a grammar tree. This tree structure can be used in many applications. You can take the information and mine data from it very fast. Also it can be used for grammar transformations (for example one that changes active sentence into questioning or passive).

O<->Rule based sentence generation from input data
O-->Written output sentence

Work on this part of project has not started yet. Data layout was not specified and it will be probably defined after completition of parser research. But now we can say that generation of sentences is much easier. Simple generators already exist. Rule 'the more simplified the generator is, the more simplified sentecnes it creates' is applied.

There are two demonstrational applications planned. First is a pseudo-intelligent chatterbot (program designed for pseudo-conversation). Second application is a machine translator that takes an English sentence and writes sentence in Esperanto.

Pseudo-intelligent chatterbot with an ability to learn
O <--Tree structure
O<->Brain & Knowledge base
O-->List of similiarities

It is a very simple program that generates knowledge base from tree structure. Then if user asks something it uses this base to generate a list of similiarities. This list can be used as input data for generator.
Sample conversation:
User: Do you know that I have a ball?
Computer: No I do not.
User:And this ball is round and black.
Computer: Wow. Very interesting.
User:It is also very big.
User: What colour is my ball?
Computer: The ball is black.
User:What do you know about my ball?
Computer: Your ball is black, very big and round.
Computer: Do you like balls?

English to Esperanto translator
Why esperanto? Because it is very simple language with small amount of rules. That is why it is suitable for demonstrating, that my work functions. Person without knowledge of this language can adapt fast and understand this rules. This translator will take tree structure and generate sentence in Esperanto. Basically, it will replace English generator.

English sentence -> syntactic parser -> Esperanto generator -> Esperanto sentence



How does it work?
Check out the documentation section.

Can I join the team?
Yes you can! Please contact project administrator through the sourceforge system.

What programming language do you use?
Proof-of-concept application is written in Pascal. But now it is written purely in C#

Can I see your recent source codes?
Of course, download them from the downloads section.

Is it possible to use this parser in my own work?
It is not impossible but you must consider that it is still work-in-progress and hence it is not finished.

So is this work open-source?
Yes. The first release (C# lib) is distributed under LGPL.


Last modified on January 14, 2008, Copyright © 2006-2008 Andrej Pančík [SVK], Valid XHTML 1.0 Transitional, Valid CSS!

This project is kindly hosted on Logo