Publically-available tools for the analysis of nifH gene sequences
Curated nifH database
Last Update: June 2017

Nitrogenase gene sequences have been sequenced in complete genomes, from genome libraries and amplified from individual organisms and the environment using the polymerase chain reaction. There are thousands of nitrogenase genes now in the public databases. We maintain an alignment of nifH genes in an ARB database. We updated this database frequently and it is available for download below. Please be sure to read the documentation, which explains how sequences were chosen and aligned, and basic features of the ARB nifH database. Please note we now include a fasta and metadata file from the most recent update. 

We commonly use full length nifH sequences obtained from cultivated diazotrophs to quickly assign nifH clusters and identify closest cultivated relatives to newly obtained nifH sequences via BLASTX. This approach must be used with caution, however, as often the closest cultivated diazotroph does not share a high sequence similarity to an unknown nifH phylotype, and cannot be used for cluster identification. A fasta file and metadata file are available at the link below.   






ARBitrator: A semi-automatic pipeline for the retrieval and curation of nifH sequences

Last Update: February 2020

Heller et al. (2014) developed a semi-automatic pipeline for the curation of nifH sequences from Genbank, which is used to update our publically available database semi-annually (see Heller et al., 2014).  






If you have questions about ARBItrator or the nifH database, email Kendra Turk-Kubo at kturk AT ucsc DOT edu.




Rapid annotation of nifH gene sequences using Classification and Regression Trees (CART)


Dr. Ildiko Frank has developed a novel approach to rapidly classify nifH amino acid sequences into well-defined phylogenetic clusters - using decision tree-type statistical models - that provides a common platform for comparative analysis across studies.


This study has just been published, Frank et al., 2016, and the script for the model is available below. 


If you have questions about this model, email Ildiko Frank at frankildiko AT hotmail DOT com.













Nitrogen fixation is catalyzed by the enzyme nitrogenase. Nitrogenase is composed of two multisubunit metalloproteins, called the molybdenum iron protein (Component 1) and the iron protein (Component 2). These are conventional nitrogenases. Alternative nitrogenases contain vanadium in Component 1 (alternative nitrogenases) or only iron (second alternative nitrogenases). Nitrogenase is likely to be an ancient enzyme, since it is distributed widely throughout Bacteria and Archaea. There are examples of possible or probably lateral gene transfer, but most of these likely occurred early in evolution. Both of the nitrogenase proteins are highly conserved, but the Fe protein, composed of two identical subunits encoded by nifH, is the most highly conserved. 


From Interpro IPR000392:

"Nitrogen fixing bacteria possess a nitrogenase enzyme complex that catalyzes the reduction of molecular nitrogen to ammonia [PUBMED:2672439, PUBMED:6327620, Norel and Elmerich].  The nitrogenase enzyme complex consists of two components:

  • Component I is nitrogenase MoFe protein or dinitrogenase, which contains 2 molecules each of 2 non-identical subunits

  • Component II (nitrogenase Fe protein or dinitrogenase reductase) is a homodimer, the monomer being coded for by the nifH gene PUBMED:6327620.


Component II has 2 ATP-binding domains and one 4Fe-4S cluster per homodimer: it supplies energy by ATP hydrolysis, and transfers electrons from reduced ferredoxin or flavodoxin to component I for the reduction of molecular nitrogen to ammonia PUBMED:2491672. There are a number of conserved regions in the sequence of these proteins: in the N-terminal section there is an ATP-binding site motif 'A' (P-loop) and in the central section there are two conserved cysteines which have been shown, in nifH, to be the ligands of the 4Fe-4S cluster."


Nitrogenase genes


Nitrogenase genes form a closely related family that likely arose from a common ancestor (nifHnifDnifKnifE, and nifN, and others). Alternative nitrogenases also contain a subunit encoded by anfG (in the alternative nitrogenases, the nifHnifD, and nifK genes are termed anfHanfDanfK, respectively). Because it is most highly conserved, nifH has been the target of most studies. This is particularly true regarding environmental studies. Thus, there are now tens of thousands of nifH genes available in Genbank. Recovering these genes, their coding regions, and metadata, and aligning them in a coherent manner is problematic. This is largely due to the lack of shared conventions for data storage among the major genomic repositories, and the large volume of legacy data where information is presented in an inconsistent manner.



Nitrogenase phylogeny

Nitrogenase genes are distributed throughout the prokaryotic kingdom, including representatives of the Archaea as well as the Eubacteria and Cyanobacteria. Although the phylogeny of nifH reflects the phylogeny of organisms based on ribosomal RNA genes, there are some differences. One deeply branching cluster is anomalous and is likely to represent an independent line of evolution, and includes some sequences from gram positive organisms, such as Clostridium. Since nitrogenase gene sequences do reflect phylogenetic affiliation, the sequence of nitrogenase genes can be used to identify the types of nitrogen-fixing microorganisms in different habitats.











Nitrogenase MoFe Protein From Azotobacter Vinelandii

Image from NCBI MMDB | Einsle et al., 2002



Ocean Sciences Department

1156 High Street

University of California, Santa Cruz, CA 95064

© 2015 by the Zehr Laboratory