-
Notifications
You must be signed in to change notification settings - Fork 249
Description
context
First of all, thanks very much for the great tool, absolutely love the termite plot.
Currently I am using this for Japanese topic modelling and using GuidedLDA, which allows you to seed topic words and conforms with sklearn's interface (most of it). So it leads to 2 feature requests (currently I just hacked it :p ).
Enable setting RC_PARAMS within textacy/viz/termite.py
when try to display Japanese characters, the term label were not displayed correctly. To solve this, we need to set
plt.rcParams['font.family'] = <one of installed font supports japanese>
# in my case plt.rcParams['font.family'] = 'Arial Unicode MS'but currently it is not possible with termite plot, because the line in textacy/viz/termite.py
with plt.rc_context(RC_PARAMS):
...will reset the rc_context and the parameter RC_PARAMS is hardcoded in, and I have to get in and change 'font.family': ['Arial Unicode MS'],.
proposed solution
It would be good to enable setting the RC_PARAMS either through environment variables or just allow it to be passed in in some way. If it is required that the style is kept the same for termite plot, then would be good at least enable setting the font so that other languages can be displayed correctly.
Allows the TopicModel class to be constructed with other topic models.
As mentioned above I'm using GuidedLDA, it mostly conform with the interface for sklearn models. (has components_, fit, transform, n_topics etc.) but currently due to the construction method in TopicModel, it is not possible:
def __init__(self, model, n_topics=10, **kwargs):
if isinstance(model, (NMF, LatentDirichletAllocation, TruncatedSVD)): # <= this line enforce class type instead of allowing duck typing
self.model = model
else:
self.init_model(model, n_topics=n_topics, **kwargs)proposed solution
maybe just change the constructor of TopicModel to:
def __init__(self, model, n_topics=10, **kwargs):
if isinstance(model, (NMF, LatentDirichletAllocation, TruncatedSVD)):
self.model = model
elif all([hasattr(model, required_attr) for required_attr in {"components_", "fit", "n_topics"}]):
self.model = model
else:
self.init_model(model, n_topics=n_topics, **kwargs)and of course, change the error message when things go wrong :p
Hope the above makes sense. Again, thanks a lot for the great work, really likes the termite plot!
Let me know what do you think!