natural language processing (NLP)

What is natural language processing?

Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken and written — referred to as natural language. It is a component of artificial intelligence (AI).

NLP has existed for more than 50 years and has roots in the field of linguistics. It has a variety of real-world applications in a number of fields, including medical research, search engines and business intelligence.

How does natural language processing work?

NLP enables computers to understand natural language as humans do. Whether the language is spoken or written, natural language processing uses artificial intelligence to take real-world input, process it, and make sense of it in a way a computer can understand. Just as humans have different sensors — such as ears to hear and eyes to see — computers have programs to read and microphones to collect audio. And just as humans have a brain to process that input, computers have a program to process their respective inputs. At some point in processing, the input is converted to code that the computer can understand.

There are two main phases to natural language processing: data preprocessing and algorithm development.

Data preprocessing involves preparing and “cleaning” text data for machines to be able to analyze it. preprocessing puts data in workable form and highlights features in the text that an algorithm can work with. There are several ways this can be done, including:

  • Tokenization. This is when text is broken down into smaller units to work with.
  • Stop word removal. This is when common words are removed from text so unique words that offer the most information about the text remain.
  • Lemmatization and stemming. This is when words are reduced to their root forms to process.
  • Part-of-speech tagging. This is when words are marked based on the part-of speech they are — such as nouns, verbs and adjectives.

Once the data has been preprocessed, an algorithm is developed to process it. There are many different natural language processing algorithms, but two main types are commonly used:

  • Rules-based system. This system uses carefully designed linguistic rules. This approach was used early on in the development of natural language processing, and is still used.
  • Machine learning-based system. Machine learning algorithms use statistical methods. They learn to perform tasks based on training data they are fed, and adjust their methods as more data is processed. Using a combination of machine learning, deep learning and neural networks, natural language processing algorithms hone their own rules through repeated processing and learning.

Why is natural language processing important?

Businesses use massive quantities of unstructured, text-heavy data and need a way to efficiently process it. A lot of the information created online and stored in databases is natural human language, and until recently, businesses could not effectively analyze this data. This is where natural language processing is useful.

The advantage of natural language processing can be seen when considering the following two statements: “Cloud computing insurance should be part of every service-level agreement,” and, “A good SLA ensures an easier night’s sleep — even in the cloud.” If a user relies on natural language processing for search, the program will recognize that cloud computing is an entity, that cloud is an abbreviated form of cloud computing and that SLA is an industry acronym for service-level agreement.

These are the types of vague elements that frequently appear in human language and that machine learning algorithms have historically been bad at interpreting. Now, with improvements in deep learning and machine learning methods, algorithms can effectively interpret them. These improvements expand the breadth and depth of data that can be analyzed.

Techniques and methods of natural language processing

Syntax and semantic analysis are two main techniques used with natural language processing.

Syntax is the arrangement of words in a sentence to make grammatical sense. NLP uses syntax to assess meaning from a language based on grammatical rules. Syntax techniques include:

  • Parsing. This is the grammatical analysis of a sentence. Example: A natural language processing algorithm is fed the sentence, “The dog barked.” Parsing involves breaking this sentence into parts of speech — i.e., dog = noun, barked = verb. This is useful for more complex downstream processing tasks.
  • Word segmentation. This is the act of taking a string of text and deriving word forms from it. Example: A person scans a handwritten document into a computer. The algorithm would be able to analyze the page and recognize that the words are divided by white spaces.
  • Sentence breaking. This places sentence boundaries in large texts. Example: A natural language processing algorithm is fed the text, “The dog barked. I woke up.” The algorithm can recognize the period that splits up the sentences using sentence breaking.
  • Morphological segmentation. This divides words into smaller parts called morphemes. Example: The word untestably would be broken into [[un[[test]able]]ly], where the algorithm recognizes “un,” “test,” “able” and “ly” as morphemes. This is especially useful in machine translation and speech recognition.
  • Stemming. This divides words with inflection in them to root forms. Example: In the sentence, “The dog barked,” the algorithm would be able to recognize the root of the word “barked” is “bark.” This would be useful if a user was analyzing a text for all instances of the word bark, as well as all of its conjugations. The algorithm can see that they are essentially the same word even though the letters are different.

Original article: https://www.techtarget.com/searchenterpriseai/definition/natural-language-processing-NLP