Text analysis

I have a survey where the answer to many of the questions are open-ended. 
So I have hundreds of responses for each question.
I created a word cloud, to see the responses, but I want to be able to do a sentiment analysis or a word analysis. 

For example, if the question is "Why do you want a career in nutrition?", there are hundreds of different answers. And very few of them have exactly answer. But there are many that include certain phrases like "helping others". 

So I want to be able to see those key terms/phrases, and have them counted automatically. 

How do I go about doing that?

Thanks!

Best Answer

  • jaeW_at_Onyx
    jaeW_at_Onyx Coach
    Answer ✓

    @user055735 , this is a VERY broad topic / question.

     

    do you have some sample data you can upload?  do you have a final dash board in mind that you want to build?

     

    a way to think about getting started would be 

    1) pick 4 or 5 categories for each free text answer

    2) decide HOW you'll identify if the text answers the question.  have a google for "stemming" and "lematization".  have a google for "text analysis" in Python.  You may not know python, but it'll give you an idea for the approach you'd need to take in SQL / Magic.  I've built use cases using "fuzzy matching" in SQL, so it depends on how complex you want to get.

     

     

    3)  have a think what you want your visualization to look like (word clouds are easy, but other cards will require your data to be structured a certain way in order to get the viz you want.

     

    For example, if you want (avg sentiment) then you'll probably want your data in the form of one row per response and the sentiment score.

     

    If you want to analyze topics, decide if you want your answer category to be single response or multi response, "they are interested in nutrition b/c helping others' OR 'they are interested in nutrition b/c they want to be healthy' .  for any question that has would be a multi response ("they are interested in nutrition b/c they like helping others AND they want to be healthy") then each response must be a separate row and a true or false.  once you've decided the viz, then 'all you have to do' is figure out how your ETL will answer the question.

     

    I've done projects like this for market research companies as well as a social media use case as well... send me a PM if you're interested in co-building a small POC.

     

    Here's an article that gives you a pretty solid idea of what it can look like!  https://www.analyticsvidhya.com/blog/2018/02/the-different-methods-deal-text-data-predictive-python/

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"

Answers

  • jaeW_at_Onyx
    jaeW_at_Onyx Coach
    Answer ✓

    @user055735 , this is a VERY broad topic / question.

     

    do you have some sample data you can upload?  do you have a final dash board in mind that you want to build?

     

    a way to think about getting started would be 

    1) pick 4 or 5 categories for each free text answer

    2) decide HOW you'll identify if the text answers the question.  have a google for "stemming" and "lematization".  have a google for "text analysis" in Python.  You may not know python, but it'll give you an idea for the approach you'd need to take in SQL / Magic.  I've built use cases using "fuzzy matching" in SQL, so it depends on how complex you want to get.

     

     

    3)  have a think what you want your visualization to look like (word clouds are easy, but other cards will require your data to be structured a certain way in order to get the viz you want.

     

    For example, if you want (avg sentiment) then you'll probably want your data in the form of one row per response and the sentiment score.

     

    If you want to analyze topics, decide if you want your answer category to be single response or multi response, "they are interested in nutrition b/c helping others' OR 'they are interested in nutrition b/c they want to be healthy' .  for any question that has would be a multi response ("they are interested in nutrition b/c they like helping others AND they want to be healthy") then each response must be a separate row and a true or false.  once you've decided the viz, then 'all you have to do' is figure out how your ETL will answer the question.

     

    I've done projects like this for market research companies as well as a social media use case as well... send me a PM if you're interested in co-building a small POC.

     

    Here's an article that gives you a pretty solid idea of what it can look like!  https://www.analyticsvidhya.com/blog/2018/02/the-different-methods-deal-text-data-predictive-python/

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
This discussion has been closed.