PCA Analysis

Version: 1.0.0

Commit Hash: d3114b71950d15d9e281e932450b65998c285ea4

Author: CauldronGO Team

Category: analysis

Principal Component Analysis for dimensionality reduction and visualization

README

PCA Analysis

Installation

⬇️ Click here to install in Cauldron (requires Cauldron to be running)

Repository: https://github.com/noatgnu/pca-analysis-plugin

Manual installation:

  1. Open Cauldron
  2. Go to PluginsInstall from Repository
  3. Paste: https://github.com/noatgnu/pca-analysis-plugin
  4. Click Install

ID: pca-analysis
Version: 1.1.0
Category: analysis
Author: CauldronGO Team

Description

Principal Component Analysis for dimensionality reduction and visualization with optional clustering

Runtime

  • Environments: python

  • Entrypoint: pca.py

Inputs

Name Label Type Required Default Visibility
input_file Input File file Yes - Always visible
annotation_file Sample Annotation File file Yes - Always visible
n_components Number of Components number (min: 2, max: 10, step: 1) Yes 2 Always visible
log2 Apply Log2 Transform boolean No false Always visible
cluster_method Clustering Method select (none, kmeans, dbscan) No none Always visible
auto_k Auto-detect Cluster Count boolean No false Visible when cluster_method = kmeans
n_clusters Number of Clusters number (min: 2, max: 20, step: 1) No 5 Visible when cluster_method = kmeans
max_k Max Clusters to Test number (min: 3, max: 30, step: 1) No 10 Visible when auto_k = true
dbscan_eps DBSCAN Epsilon number (min: 0, max: 10, step: 0) No 0.5 Visible when cluster_method = dbscan
dbscan_min_samples DBSCAN Min Samples number (min: 2, max: 50, step: 1) No 5 Visible when cluster_method = dbscan

Input Details

Input File (input_file)

Data matrix file containing samples and measurements

Sample Annotation File (annotation_file)

Annotation file with Sample column specifying which columns to analyze, plus Condition/Batch for grouping

Number of Components (n_components)

Number of principal components to compute

Apply Log2 Transform (log2)

Apply log2 transformation to the data before PCA

Clustering Method (cluster_method)

Method to detect clusters from PCA embeddings

  • Options: none, kmeans, dbscan

Auto-detect Cluster Count (auto_k)

Automatically determine optimal number of clusters using elbow method

Number of Clusters (n_clusters)

Number of clusters for KMeans clustering (ignored if auto-detect is enabled)

Max Clusters to Test (max_k)

Maximum number of clusters to test for elbow method

DBSCAN Epsilon (dbscan_eps)

Maximum distance between samples for DBSCAN neighborhood

DBSCAN Min Samples (dbscan_min_samples)

Minimum number of samples in a neighborhood for DBSCAN core points

Outputs

Name File Type Format Description
pca_output pca_output.txt data tsv PCA coordinates for each sample with optional cluster assignments
explained_variance explained_variance_ratio.json data json Variance explained by each principal component
pca_3d pca_3d.html html html Interactive 3D PCA visualization (generated when n_components >= 3)
elbow_plot elbow_plot.html html html Elbow plot for optimal cluster count (generated when auto_k is enabled)

Sample Annotation

This plugin supports sample annotation:

  • Annotation File: annotation_file

Visualizations

This plugin generates 2 plot(s):

PCA Plot (by Condition) (pca-scatter-condition)

  • Type: scatter
  • Data Source: pca_output
  • Default: Yes
  • Customization Options: 9 available

PCA Plot (by Cluster) (pca-scatter-cluster)

PCA scatter plot colored by detected clusters

  • Type: scatter
  • Data Source: pca_output
  • Customization Options: 9 available

Requirements

  • Python Version: >=3.11

Package Dependencies (Inline)

Packages are defined inline in the plugin configuration:

  • numpy>=1.24.0
  • pandas>=2.0.0
  • scikit-learn>=1.3.0
  • plotly>=5.18.0

Note: When you create a custom environment for this plugin, these dependencies will be automatically installed.

Example Data

This plugin includes example data for testing:

  cluster_method: kmeans
  auto_k: true
  max_k: 6
  input_file: examples/diann/imputed.data.txt
  annotation_file: examples/diann/annotation.txt
  n_components: 3
  log2: true

Load example data by clicking the Load Example button in the UI.

Usage

Via UI

  1. Navigate to analysisPCA Analysis
  2. Fill in the required inputs
  3. Click Run Analysis

Via Plugin System

const jobId = await pluginService.executePlugin('pca-analysis', {
  // Add parameters here
});