Difference between revisions of "Data Science Platform"

From UFRC
Jump to navigation Jump to search
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[Category:Software]][[Category:Machine Learning]][[Category:Data Science]][[Category:Help]]
+
[[Category:Data Science]][[Category:Help]]
 
{|align=right
 
{|align=right
 
   |__TOC__
 
   |__TOC__
 
   |}
 
   |}
This page describes the collection of Data Science platforms on HiperGator. Data science is the practice of using math, programming, analytics, AI, and machine learning to discover valuable insights within large data sets. Research Computing provides essential infrastructure, tools, and expertise to support data science research and accelerate impactful discoveries, or other uses via [https://support.rc.ufl.edu/ support requests] or [https://www.rc.ufl.edu/get-started/purchase-allocation/training--consultation-rates/ consulting].
+
 
 +
This page describes the collection of data science platforms on HiPerGator. Data science is the practice of using mathematics, programming, analytics, AI, and machine learning to discover valuable insights within large data sets. Research Computing provides essential infrastructure, tools, and expertise to support data science research and accelerate impactful discoveries. All support requests and inquiries are handled via [https://support.rc.ufl.edu/ support tickets] or [https://www.rc.ufl.edu/get-started/price-list/ consulting].
  
 
==Module platforms for Data Science==
 
==Module platforms for Data Science==
 
*'''SQL:''' Oracle [[SQL Developer]] is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. [[SQLite]] is an in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
 
**Use the command below to list the available versions on HiPerGator-AI.
 
**<pre>module spider sqldeveloper</pre>
 
**<pre>module spider sqlite</pre>
 
 
*'''[[R]]:''' R is a free software environment for statistical computing and graphics. It is favored in data science for its extensive suite of tools and packages for data manipulation, statistical modeling, and visualization, making it ideal for tasks ranging from simple data analysis to complex predictive modeling.
 
**Use the command below to list the available versions on HiPerGator-AI.
 
**<pre>module spider R</pre>
 
 
 
*'''[[Matlab]]:''' MATLAB is a powerful computing environment and programming language widely used in data science for its robust capabilities in numerical and matrix calculations, advanced data visualization, and the development of algorithms. It is particularly useful for applications in engineering and scientific research.
 
**Use the command below to list the available versions on HiPerGator-AI.
 
**<pre>module spider matlab</pre> or <pre>module spider mcr</pre>
 
 
  
 
*'''[[Python]]:''' Python is enhanced by extraordinary libraries such as NumPy, SciPy, Pandas, Matplotlib, Scrapy, and BeautifulSoup. These libraries are crucial for data science and are utilized daily by programmers to solve problems.
 
*'''[[Python]]:''' Python is enhanced by extraordinary libraries such as NumPy, SciPy, Pandas, Matplotlib, Scrapy, and BeautifulSoup. These libraries are crucial for data science and are utilized daily by programmers to solve problems.
Line 42: Line 28:
  
  
*'''[[Rapidsai]]:''' RAPIDS AI accelerates end-to-end data science pipelines by providing a familiar dataframe API. It supports machine learning integration without the typical serialization costs and enables multi-node, multi-GPU deployments for faster processing of large datasets.
+
*'''[[Rapidsai]]:''' RAPIDS AI is a suite of CUDA-enabled open-source software libraries and APIs by NVIDIA. It accelerates end-to-end data science pipelines by providing a familiar dataframe API. It supports machine learning integration without the typical serialization costs and enables multi-node, multi-GPU deployments for faster processing of large datasets.
 
**Use the command below to list the available versions on HiPerGator-AI.  
 
**Use the command below to list the available versions on HiPerGator-AI.  
 
**<pre>module spider rapidsai</pre>
 
**<pre>module spider rapidsai</pre>
 +
 +
 +
*'''[[R]]:''' R is a free software environment for statistical computing and graphics. It is favored in data science for its extensive suite of tools and packages for data manipulation, statistical modeling, and visualization, making it ideal for tasks ranging from simple data analysis to complex predictive modeling.
 +
**Use the command below to list the available versions on HiPerGator.
 +
**<pre>module spider R</pre>
 +
 +
 +
*'''[[Matlab]]:''' MATLAB is a powerful computing environment and programming language widely used in data science for its robust capabilities in numerical and matrix calculations, advanced data visualization, and the development of algorithms. It is particularly useful for applications in engineering and scientific research.
 +
**Use the command below to list the available versions on HiPerGator.
 +
**<pre>module spider matlab</pre> or <pre>module spider mcr</pre>
 +
 +
 +
*'''SQL:''' Oracle [[SQL Developer]] is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. [[SQLite]] is an in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
 +
**Use the command below to list the available versions on HiPerGator.
 +
**<pre>module spider sqldeveloper</pre>
 +
**<pre>module spider sqlite</pre>
 +
  
 
'''Note:''' If the environments or these platforms do not have the libraries you require, you may need to create a Conda environment. See [[Conda]] and [[Managing_Python_environments_and_Jupyter_kernels]] for more details.
 
'''Note:''' If the environments or these platforms do not have the libraries you require, you may need to create a Conda environment. See [[Conda]] and [[Managing_Python_environments_and_Jupyter_kernels]] for more details.

Latest revision as of 19:52, 14 May 2024

This page describes the collection of data science platforms on HiPerGator. Data science is the practice of using mathematics, programming, analytics, AI, and machine learning to discover valuable insights within large data sets. Research Computing provides essential infrastructure, tools, and expertise to support data science research and accelerate impactful discoveries. All support requests and inquiries are handled via support tickets or consulting.

Module platforms for Data Science

  • Python: Python is enhanced by extraordinary libraries such as NumPy, SciPy, Pandas, Matplotlib, Scrapy, and BeautifulSoup. These libraries are crucial for data science and are utilized daily by programmers to solve problems.
    • Use the command below to list the available versions on HiPerGator-AI.
    • module spider python


  • TensorFlow: TensorFlow is an open-source software library widely used for implementing artificial neural networks and deep learning. It is extensively utilized in data science to build and train complex machine learning models, providing scalable and flexible tools for deep learning, numerical computation, and large-scale optimization. TensorFlow also supports both research and production deployments.
    • Use the command below to list the available versions on HiPerGator-AI.
    • module spider tensorflow


  • Pytorch: PyTorch is a Python-based scientific computing package that leverages the power of graphics processing units. It is effectively used in various data science applications, particularly those involving complex numerical computations or the development of custom machine learning models.
    • Use the command below to list the available versions on HiPerGator-AI.
    • module spider pytorch


  • Scikit-learn: Scikit-learn is a collection of Python modules for machine learning and data mining. It offers simple and efficient tools for predictive data analysis and is built on NumPy, SciPy, and Matplotlib. Scikit-learn is open source and can be used commercially under a BSD license.
    • Use the command below to list the available versions on HiPerGator-AI.
    • module spider scikit-learn


  • Rapidsai: RAPIDS AI is a suite of CUDA-enabled open-source software libraries and APIs by NVIDIA. It accelerates end-to-end data science pipelines by providing a familiar dataframe API. It supports machine learning integration without the typical serialization costs and enables multi-node, multi-GPU deployments for faster processing of large datasets.
    • Use the command below to list the available versions on HiPerGator-AI.
    • module spider rapidsai


  • R: R is a free software environment for statistical computing and graphics. It is favored in data science for its extensive suite of tools and packages for data manipulation, statistical modeling, and visualization, making it ideal for tasks ranging from simple data analysis to complex predictive modeling.
    • Use the command below to list the available versions on HiPerGator.
    • module spider R


  • Matlab: MATLAB is a powerful computing environment and programming language widely used in data science for its robust capabilities in numerical and matrix calculations, advanced data visualization, and the development of algorithms. It is particularly useful for applications in engineering and scientific research.
    • Use the command below to list the available versions on HiPerGator.
    • module spider matlab
      or
      module spider mcr


  • SQL: Oracle SQL Developer is a free integrated development environment that simplifies the development and management of Oracle Database in both traditional and Cloud deployments. SQLite is an in-process library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
    • Use the command below to list the available versions on HiPerGator.
    • module spider sqldeveloper
    • module spider sqlite


Note: If the environments or these platforms do not have the libraries you require, you may need to create a Conda environment. See Conda and Managing_Python_environments_and_Jupyter_kernels for more details.

Examples and Reference Data

Please see /data/ai/ folder, AI_Examples, and AI_Reference_Datasets for helpful resources. Addition references, such as how to run RAPIDs on HiPergator are already available in /data/ai/examples/rapids.