Thinking of things like the phantom boat or Charlie’s beds, is there a standard location system for non tracked automation? Is it RTK, computer vision or something else?

Human vision, usually!

The Phantom boat is driven by a stage hand.

Stage trucks are often driven directly by stage crew who know the marks they need to hit and make sure they do so within a reasonable margin of error. Basically remote-controlled cars.
It takes practice and rehearsal of course, but is more than good enough for most purposes and allows for some flexibility in case actors or other stage elements miss their marks!

There’s nothing “standard” in theatre automation since every theatre, show, concert has a very unique set of requirements. Some shows have driver line of site and do it manually, some have IR or laser line of sight, some have LiDAR and related local scanning systems, some are hot wire followers feeding positions data back to show synchronisation, some use a deck embedded grid of position transceivers, some are wire or job controlled (masked by other scenic and decorative items) some use one, two or all of the above systems. 

Your question is like asking “what is the standard sort of microphone used in shows” - there’s no a single answer other than “whichever one can do the job and meets the budget of the particular show”

