Sep 5, - Apache Ant 1. The Apache Ant team currently maintains two lines of development. The 1. Both lines are based off of Ant 1. We recommend using 1. Ant 1.
|Published (Last):||19 November 2019|
|PDF File Size:||15.10 Mb|
|ePub File Size:||15.84 Mb|
|Price:||Free* [*Free Regsitration Required]|
To be able to detect entities the Name Finder needs a model. The model is dependent on the language and entity type it was trained for. The OpenNLP projects offers a number of pre-trained name finder models which are trained on various freely available corpora.
They can be downloaded at our model download page. To find names in raw text the text must be segmented into tokens and sentences. A detailed description is given in the sentence detector and tokenizer tutorial. It is important that the tokenization for the training data and the input text is identical. The tool is only intended for demonstration and testing. Just copy this text to the terminal: Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov.
Vinken is chairman of Elsevier N. Name Finder API To use the Name Finder in a production system it is strongly recommended to embed it directly into the application instead of using the command line interface. First the name finder model must be loaded into memory from disk or an other source. In the sample below it is loaded from disk. The model content is not valid for some other reason After the model is loaded the NameFinderME can be instantiated.
The NameFinderME class is not thread safe, it must only be called from one thread. To use multiple threads multiple NameFinderME instances sharing the same model instance can be created. The input text should be segmented into documents, sentences and tokens. To perform entity detection an application calls the find method for every sentence in the document.
After every document clearAdaptiveData must be called to clear the adaptive data in the feature generators. Not calling clearAdaptiveData can lead to a sharp drop in the detection rate after a few documents. The elements between the begin and end offsets are the name tokens.
In this case the begin offset is 0 and the end offset is 2. The Span object also knows the type of the entity. In this case it is person defined by the model. It can be retrieved with a call to Span. Additionally to the statistical Name Finder, OpenNLP also offers a dictionary and a regular expression name finder implementation. TODO: Explain how to retrieve probs from the name finder for names and for non recognized names Name Finder Training The pre-trained models might not be available for a desired language, can not detect important entities or the performance is not good enough outside the news domain.
These are the typical reason to do custom training of the name finder on a new corpus or on a corpus which is extended by private training data taken from the data which should be analyzed. Training Tool OpenNLP has a command line tool which is used to train the models available from the model download page on various corpora.
The data can be converted to the OpenNLP name finder training format. Which is one sentence per line. Some other formats are available as well. The sentence must be tokenized and contain spans which mark the entities. Documents are separated by empty lines which trigger the reset of the adaptive feature generators. A training file can contain multiple types. If the training file contains multiple types the created model will also be able to detect these multiple types.
The training data should contain at least sentences to create a model which performs well. It is now assumed that the english person name finder model should be trained from a file called en-ner-person.
The following command will train the name finder and write the model to en-ner-person. It is also possible to use the -resources parameter to generate features based on external knowledge such as those based on word representation clustering features. The external resources must all be placed in a resource directory which is then passed as a parameter.
If this option is used it is then required to pass, via the -featuregen parameter, a XML custom feature generator which includes some of the clustering features shipped with the TokenNameFinder.
Currently three formats of clustering lexicons are accepted: Space separated two column file specifying the token and the cluster class as generated by toolkits such as word2vec. Additionally it is possible to specify the number of iterations, the cutoff and to overwrite all types in the training data with a single type. Basically three steps are necessary to train it: The application must open a sample data stream Call the NameFinderME.
Users which want to experiment with the feature generation can provide a custom feature generator. Either via API or via an xml descriptor file. Feature Generation defined by API The custom generator must be used for training and for detecting the names. If the feature generation during training time and detection time is different the name finder might not be able to detect names.
The javadoc of the feature generator classes explain what the individual feature generators do. To write a custom feature generator please implement the AdaptiveFeatureGenerator interface or if it must not be adaptive extend the FeatureGeneratorAdapter.
To detect names the model which was returned from the train method must be passed to the NameFinderME constructor. The descriptor file is stored inside the model after training and the feature generators are configured correctly when the name finder is instantiated. The sample xml contains additional feature generators with respect to the API defined above. The following table shows the supported elements:.
Apache Ant Security Reports
Optional Tasks Ant supports a number of optional tasks. An optional task is a task which typically requires an external library to function. The optional tasks are packaged together with the core Ant tasks. The external libraries required by each of the optional tasks is detailed in the Library Dependencies section. This makes the JAR files available to all Ant users and builds.
APACHE ANT 1.8.4 MANUAL PDF
The main known usage of Ant is the build of Java applications. Ant supplies a number of built-in tasks allowing to compile, assemble, test and run Java applications. More generally, Ant can be used to pilot any type of process which can be described in terms of targets and tasks. Ant is written in Java.
Apache Ant™ Project News
Apache Ant is a Java library and command-line tool that help building software. Downloading Apache Ant Use the links below to download a binary distribution of Ant from one of our mirrors. It is good practice to verify the integrity of the distribution files, especially if you are using one of our mirror sites. In order to do this you must use the signatures from our main distribution directory.
Installing Apache Ant
To be able to detect entities the Name Finder needs a model. The model is dependent on the language and entity type it was trained for. The OpenNLP projects offers a number of pre-trained name finder models which are trained on various freely available corpora. They can be downloaded at our model download page. To find names in raw text the text must be segmented into tokens and sentences.