Text file Filter and Formatter
Version 2.12

This
product can be used to filter or process text files, to create new
files with different information or formats. It can also extract
content from XML, PDF, or HTML and perform some level of analysis.
More advanced organisation and search capabilities are also
included. You can download the
demo verson of the program
here, while the
user guide
describes the product features in detail and their usefulness
for text analysis/formatting.
This page describes a text file processing program that can filter
or format text-based content. The application now includes more
advanced organiser and search capabilities, as well as the formatting
and filtering ones. For text processing itself, it can read Text or
XML files and can apply a sequence of operations on the text to
transform it into a different format or structure. This can also
involve removing content,
where a unique feature is probably the ability to generate ordered
lists of words from files of arbitrary text. The program can also
extract the
contents of PDF or HTML files, to process just the text content.
It can also perform some of the more common formatting operations. The
key features are as follows:
- Re-structure the text based on sequences of filtering options.
- Re-format text files for different specifications (line width,
spacing, indentation, character replacement, upper/lower case, for
example).
- Filter text to remove/replace specified words or tokens.
- Remove/keep specific words or lines.
- Line truncation before or after specified characters.
- Extract only the content of XML files, or re-format to use tag
names as well.
- Limited conversion back to XML format.
- Extract the text content of HTML or PDF
files.
- Generate ordered lists of words from re-structured text.
- A toolbar for quick access to some of the formatting options,
including a processing status field.
- Basic XPath-style search operations, with wildcards.
- Selection through search of specific areas of text, to be
reformatted separate to the whole document.
- A manual panel that allows you to process the text at an
individual row, column or cell level. This is different to
processing a whole document and allows you easy access to the
parts that are more difficult to define.
- An analysis panel that allows you to perform some level of analysis /
data mining over the text. This is configurable and you can save the
results to a file.
- A document organiser can be used to
organise or group your documents based on their content.
- Undo / redo options for the last 5 operations.
Product Details
The product consists of three different
panels that perform different kinds of operations. Two of the panels
can filter or format the text and update the result, while the third
panel can perform a limited amount of analysis over the text. The
panels in the GUI are as follows:
- The General panel allows for general filtering operations over the
whole text file. This is also the panel provided in the demo version.
- The Manual panel allows the user to manually specify certain cells or
areas to filter or remove.
- The Analysis panel can perform some statistical analysis over the raw
or filtered text.
General Panel
This panel can perform filtering and formatting processes over the
whole text document and is shown below. You specify a number of
filtering/formatting options that are to be performed in sequence and
then run them to change the text. This panel consists of a left-hand
side with the filtering/formatting options and a right-hand side that
shows the document after it has been changed. You can reformat the
whole document, or select specific sections that will be reformatted
by themselves.

Manual Panel
The General panel allows you to choose a sequence of operations that
are automatically applied to the whole document, or document sections.
The Manual filtering panel provides additional functionality through a
grid or tabular format that allows you to select specific columns and
rows to delete or format. This means that you can more easily access
columns or single words in the middle of lines that a general line
format could not easily change. The grid table also provides a popup
menu with additional formatting options. So with the grid format you
can:
- Select certain (whole or partial) columns or rows of data to remove,
while keeping the rest of the text.
- Insert new columns of text at specific positions, or alter the
existing text in these cells with additional text.
- Save finally formatted text with a user-specified separator between
the cell words, such as for a CSV file.
- Simply remove certain rows of text from the document before processing.
A graphic of the Manual panel is shown below:

Analysis Panel
The product also comes with a panel that can provide a
limited amount of analysis of what the newly formatted text
represents. The analysis essentially performs some statistical
operations on the text words, comparing them to determine the most
popular groupings. This analysis includes:
- The standard line, word and character counts of a Word document, for
example.
- Other analysis possibilities, where the default is to tell you what
the most popular words or word sequences are.
The two optional choices count the following:
- The most popular word count reads every word and counts the number of
times that it occurs. The top specified number of words is then
output.
- The most popular sequences option looks for popular word sequences in
the text and again outputs the most popular ones. For example, the
number of times two or three words appear together. The graphic below
is an example of the analysis panel, with an analysis of the menu text
of the General panel displayed.

The analysis can be performed over the previously filtered text, a
single text file, or text from a number of files combined. It is
configurable and can be saved to a file in XML format, where several
different analysis sets can then be compared themselves. Optional
analysis features also allow you to remove numbers, symbols, or
commonly occurring words from the analysis.
Example Processes
The following pages give further examples of formatting or filtering
operations:
Search for particular lines and reformat them.
XPath-style query to search for and filter text lines.
Reformat a document to a new width and layout.
Analyse a list of words.
You can download the demo verson of the program, either as a windows
installer, or a
zip file with jar exe. The
demo version is fully functional, except for the following: the redo /
undo buttons are disabled and there is a reminder window that opens when you go
to load or save a file. As there is no dedicated
version changes page, you will need to check the documentation for
this.
Buying the Application
If you buy a copy of the program, you will be entitled to free
updates. You can register your copy, when you will receive a password
for accessing the free updates. The only cost will be a £1.00 GBP
admin fee.
System Requirements
Your system must meet the following requirements for the software
to work:
- The installation package only runs on Windows, as a windows setup.exe.
- However, on request, a zip file can be provided for other
operating systems.
- The program was written in Java and will require a relatively recent
version of the Java JRE to run.
If you are unsure, then you can download the demo version first. It has the same interface and so if it is suitable
then the professional version will also be suitable.
Purchasing the Software
The professional version can be bought for £30.00 from the payloadz
online site, by clicking on the link below. If you contact me through
this site, I can provide an operating system independent zip file
instead.
Payloadz