For the queuing system, you can use
participant metadata to handle this. When a participant enters the queue, you could include a field with the current timestamp. That should allow you to sort the queue on the client side.
On synchronization, yes, it'd be better to ensure that music and user's singing come from the same machine. That is the best way to ensure there are no unexpected latency between different sources.