This issue goes beyond RL development, and can actually be discussed at the level of game development in general.
I've been interested in gamedev for a while and have read quite a bit, but it turns out that most tutorials and books out there will essentially teach you an API, but never really tell you how your code should be organised. It's like all that matters is that you can render things.
Fortunately, I came across a review of a book that said the book was more about how to organise code than how to render specific things. I've been reading through it and it led me to completely rework the RL I'm currently developing.
The book is called Game Coding Complete (4th ed.), and I can heartily recommend it if you already have some programming experience. It's for C++ and DirectX, whereas I want Python and SFML, but it is entirely possible to ignore the technical aspects and focus on the organisational ones.
They essentially espouse the MVC pattern (model-view-controller), where the model is the game itself (no graphics involved), the view can the the user-interface/rendering for the human, or a network interface for a remote player, or for an AI, and the controller deals with inputs (for the player) and possibly AI decisions.
I highly recommend you just find the book somewhere and give a read to chapters 1-3,5-7,10,11, but the two main things related to your question are the following:
Have an EventManager. The model is doing its own thing, with time passing, actors moving, etc etc. Whenever anything of interest happens, send the EventManager an Event reporting it. If an actor moved, if an actor was created, if an attack occurred, etc etc etc. The EventManager is going to distribute these events to whoever is interested in them. Your game classes tell the EventManager what events they are interested in, and what function to call back once an Event is triggered.
For example, your rendering system is attached to the player's View. The View is interested in all events of the type Move (which it uses to update the locations of its sprites), of type Create (so it can add another visual representation for the object), of type Attack (so it can write "you do 5 damage") to the screen, etc etc.
For animations, you're likely to need something else, which changes the main loop. The authors of the book suggest using a ProcessManager, for anything that takes more than one "frame" or update to process. This can include things like AI routines that stop executing after a little bit so they don't hog a processor, or animations. Suppose an actor throws a grenade and it explodes. The model will deal damage to everything within a given radius, and then will generate an Explosion event. The Human View (which previously registered for Explosion events) will receive the event, and create a new Process whose sole purpose is to draw an explosion animation at the given location (by choosing frame after frame after frame as time passes). Once the explosion is complete, the process kills itself off.
So that's how the authors do it... at least in my interpretation of what they explained, but to be honest I did more of a quick perusal than a deep study, so I might be wrong. But conceptually this seems to make a lot of sense, and to be generalisable to many gamedev scenarios.
Hope this helps!