Difference between revisions of "NLP"

From UFRC
Jump to navigation Jump to search
Line 3: Line 3:
 
   |__TOC__
 
   |__TOC__
 
   |}
 
   |}
Natural language processing software and resources on HiperGator include several different software environments and examples with Nvidia Megatron and other software. NLP is involved in many other fields of AI, such as image recognition. Research Computing can help with language modeling for knowledge exploration, measurement, classification, summarization, conversational AI, or other uses via [https://support.rc.ufl.edu/ support requests] or [https://www.rc.ufl.edu/get-started/purchase-allocation/training--consultation-rates/ consulting].
+
This page describes the collection of Natural language processing software on HiperGator. Natural Language Processing (NLP) is a part of artificial intelligence (AI) that helps computers understand and respond to human language. It's used in things like voice assistants, chatbots, and translation apps. NLP combines language rules with machine learning to help computers grasp not just words, but also the intent and feelings behind them. NLP is improving how AI works in different areas. For example, in healthcare, it helps analyze medical records to aid patient care. Research Computing can help with language modeling for knowledge exploration, measurement, classification, summarization, conversational AI, or other uses via [https://support.rc.ufl.edu/ support requests] or [https://www.rc.ufl.edu/get-started/purchase-allocation/training--consultation-rates/ consulting].
  
 
==Environment Modules for NLP==
 
==Environment Modules for NLP==
*'''nlp:''' <code>module load nlp</code> provides a Python environment with pytorch, torchtext, nltk, Spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
+
*'''[[nlp]]:''' <code>module load nlp</code> provides a Python environment with pytorch, torchtext, nltk, Spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
 +
**Use the following command to list the available versions on HiPerGator-AI:
 +
**<pre>module spider nlp</pre>
  
  
*'''ngc-pytorch:''' <code>module load ngc-pytorch</code> will provides a singularity container Python environment with pytorch including the Nvidia Apex optimizers required for [https://github.com/NVIDIA/Megatron-LM Megatron-LM]. Research computing has pretrained, large parameter Megatron language models available to HiperGator users. See /data/ai/examples/nlp or [[AI_Examples]] for more information.
+
*'''[[ngc-pytorch]]:''' <code>module load ngc-pytorch</code> will provides a singularity container Python environment with pytorch including the Nvidia Apex optimizers required for [https://github.com/NVIDIA/Megatron-LM Megatron-LM]. Research computing has pretrained, large parameter Megatron language models available to HiperGator users. See /data/ai/examples/nlp or [[AI_Examples]] for more information.
 +
**Use the following command to list the available versions on HiPerGator-AI:
 +
**<pre>module spider ngc-pytorch</pre>
  
  
*'''Flair NLP:''' See [[FlairNLP]] for more information.
+
*'''[[Flair NLP]]:''' See [[FlairNLP]] for more information.
  
  
*'''nemo:''' <code>module load nemo</code> will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models, and the option to apply your own pretrained Megatron language models.  
+
*'''[[nemo]]:''' <code>module load nemo</code> will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models, and the option to apply your own pretrained Megatron language models.  
 +
**Use the following command to list the available versions on HiPerGator-AI:
 +
**<pre>module spider nemo</pre>
  
  
*'''pytorch or tensorflow:''' Note, use <code>module spider pytorch</code> or <code>tensorflow</code> to list the version we have available. If the nlp environments or these environments do not have libraries you require, you made need to create a Conda environment. See [[Conda]] and [[Managing_Python_environments_and_Jupyter_kernels]] for more details.
+
*'''[[pytorch or tensorflow]]:''' Note, use <code>module load pytorch</code> or <code>tensorflow</code> to list the version we have available. If the nlp environments or these environments do not have libraries you require, you made need to create a Conda environment. See [[Conda]] and [[Managing_Python_environments_and_Jupyter_kernels]] for more details.
 +
**Use the following command to list the available versions on HiPerGator-AI:
 +
**<pre>module spider pytorch</pre>
 +
**<pre>module spider tensorflow</pre>
  
  
*'''spark-nlp:''' See our [[Spark]] help doc to start a Spark cluster. Spark-nlp Python module is available in tensorflow/2.4.1.
+
*'''[[spark-nlp]]:''' See our [[Spark]] help doc to start a Spark cluster. Spark-nlp Python module is available in tensorflow/2.4.1.
  
  
*'''parlai:''' Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.   
+
*'''[[parlai]]:''' Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.   
  
 
==Large Language Models==
 
==Large Language Models==
  
Many large models are available for open source download, although may require different software frameworks or end user license agreements. Starter LLMs trained using Megatron-LM are available in the examples and reference data folder. These models can be applied as is, trained more, or fine-tuned. Starter models include a 20B GPT and a 9B parameter BERT. Please create a help ticket for more information.
+
A variety of large language models are accessible for open-source download, though they might need specific software frameworks or adhere to particular end-user license agreements. Examples include starter LLMs trained using Megatron-LM, Llama2, and Llama3 which are located in the examples and reference data folder. These models, such as the 20B parameter GPT and the 9B parameter BERT, can be used as they are, further trained, or fine-tuned to meet specific needs. For the latest LLMs, such as LLaMA, GEMMA, and Mistral AI models, which provide advanced features and enhanced performance, please submit a [https://support.rc.ufl.edu/enter_bug.cgi help ticket] for further details and support. You can also find more information on our AI Models page.
  
  

Revision as of 15:45, 9 May 2024

This page describes the collection of Natural language processing software on HiperGator. Natural Language Processing (NLP) is a part of artificial intelligence (AI) that helps computers understand and respond to human language. It's used in things like voice assistants, chatbots, and translation apps. NLP combines language rules with machine learning to help computers grasp not just words, but also the intent and feelings behind them. NLP is improving how AI works in different areas. For example, in healthcare, it helps analyze medical records to aid patient care. Research Computing can help with language modeling for knowledge exploration, measurement, classification, summarization, conversational AI, or other uses via support requests or consulting.

Environment Modules for NLP

  • nlp: module load nlp provides a Python environment with pytorch, torchtext, nltk, Spacy, transformers, sentence-transformers, Flair, BERTopic for topic modeling, sentencepiece, RAPIDSai for data processing and machine learning algorithms, gensim, scikit-learn, and more.
    • Use the following command to list the available versions on HiPerGator-AI:
    • module spider nlp


  • ngc-pytorch: module load ngc-pytorch will provides a singularity container Python environment with pytorch including the Nvidia Apex optimizers required for Megatron-LM. Research computing has pretrained, large parameter Megatron language models available to HiperGator users. See /data/ai/examples/nlp or AI_Examples for more information.
    • Use the following command to list the available versions on HiPerGator-AI:
    • module spider ngc-pytorch



  • nemo: module load nemo will provide a singularity container environment with Python and Nvidia NeMo. NeMo has NLP task training, plus speech-to-text and text-to-speech models, and the option to apply your own pretrained Megatron language models.
    • Use the following command to list the available versions on HiPerGator-AI:
    • module spider nemo


  • pytorch or tensorflow: Note, use module load pytorch or tensorflow to list the version we have available. If the nlp environments or these environments do not have libraries you require, you made need to create a Conda environment. See Conda and Managing_Python_environments_and_Jupyter_kernels for more details.
    • Use the following command to list the available versions on HiPerGator-AI:
    • module spider pytorch
    • module spider tensorflow


  • spark-nlp: See our Spark help doc to start a Spark cluster. Spark-nlp Python module is available in tensorflow/2.4.1.


  • parlai: Conversational AI framework by Facebook, includes a wide variety of models from 110M to 9B parameters.

Large Language Models

A variety of large language models are accessible for open-source download, though they might need specific software frameworks or adhere to particular end-user license agreements. Examples include starter LLMs trained using Megatron-LM, Llama2, and Llama3 which are located in the examples and reference data folder. These models, such as the 20B parameter GPT and the 9B parameter BERT, can be used as they are, further trained, or fine-tuned to meet specific needs. For the latest LLMs, such as LLaMA, GEMMA, and Mistral AI models, which provide advanced features and enhanced performance, please submit a help ticket for further details and support. You can also find more information on our AI Models page.


Examples and Reference Data

Please see /data/ai/ folder, AI_Examples, and AI_Reference_Datasets for helpful resources. Notebooks and batch scripts cover everything from pretraining and inferencing to summarization, information extraction, and topic modeling. Addition reference data, including benchmarks such as the popular superglue, are already available in /data/ai/benchmarks/nlp.