I review the physical processes involved in massive star and star cluster formation. I describe how these are combined in a theoretical model - Core Accretion - for massive star formation, which assumes this process is a scaled-up version of low-mass star formation. The assumption of initial massive starless cores that are near virial equilibrium can be tested by studies of Infrared Dark Clouds. I show some of our latest observations of these clouds, including from ALMA. At later stages when the protostar is forming and becoming infrared bright, I show how the morphology is determined by bipolar outflow cavities. Their appearance from ~10 to 40 microns tests the properties of the core immediately surrounding the massive protostar. The predictions of the model appear to be validated in at least several nearby examples. The case of the massive protostar in Orion KL is more complex, but I discuss how it also can be understood in the context of Core Accretion theory. Finally, I discuss application of massive star formation theory to the early universe: how massive were the first stars and could they have been the progenitors of supermassive black holes?