top of page

Publicly-available tools for the analysis of nifH gene sequences

Nitrogen fixation is catalyzed by the enzyme nitrogenase. Nitrogenase is composed of two multisubunit metalloproteins, called the molybdenum iron protein (Component 1) and the iron protein (Component 2). These are conventional nitrogenases. Alternative nitrogenases contain vanadium in Component 1 (alternative nitrogenases) or only iron (second alternative nitrogenases). Nitrogenase is likely to be an ancient enzyme, since it is distributed widely throughout Bacteria and Archaea. There are examples of possible or probably lateral gene transfer, but most of these likely occurred early in evolution. Both of the nitrogenase proteins are highly conserved, but the Fe protein, composed of two identical subunits encoded by nifH, is the most highly conserved. 



ARBitrator is a semi-automatic pipeline for the retrieval and curation of nifH sequences.


Heller et al. (2014) developed a semi-automatic pipeline for the curation of nifH sequences from Genbank, which is used to update our publically available database semi-annually (see Heller et al., 2014).  

If you have questions about ARBItrator or the nifH database, email Kendra Turk-Kubo at This information was last update on March 2022.

nifH database

Nitrogenase gene sequences have been sequenced in complete genomes, from genome libraries and amplified from individual organisms and the environment using the polymerase chain reaction. There are thousands of nitrogenase genes now in the public databases. We maintain an alignment of nifH genes in an ARB database. We updated this database frequently and it is available for download below. Please be sure to read the documentation, which explains how sequences were chosen and aligned, and basic features of the ARB nifH database. Please note we now include a fasta and metadata file from the most recent update. 



We commonly use full length nifH sequences obtained from cultivated diazotrophs to quickly assign nifH clusters and identify closest cultivated relatives to newly obtained nifH sequences via BLASTX. This approach must be used with caution, however, as often the closest cultivated diazotroph does not share a high sequence similarity to an unknown nifHphylotype, and cannot be used for cluster identification. A fasta file and metadata file are available at the link below.   



Rapid annotation of nifH gene sequences using Classification and Regression Trees (CART).

Dr. Ildiko Frank has developed a novel approach to rapidly classify nifH amino acid sequences into well-defined phylogenetic clusters - using decision tree-type statistical models - that provides a common platform for comparative analysis across studies.


This study has just been published, Frank et al., 2016, and the script for the model is available below. 

If you have questions about this model, email Ildiko Frank at

bottom of page