How To Prompt ChatGPT To Create a Complete Data Profiling Framework

Getting to know your dataset inside and out is crucial for any data science project, but it's not always clear where to start. This ChatGPT prompt helps create a comprehensive data profiling framework tailored to your specific dataset and needs. Instead of generic advice, you'll get personalized guidance on everything from basic statistical summaries to advanced quality assessments. The prompt includes strategic questions that ensure ChatGPT understands your exact requirements and constraints before providing recommendations.

Prompt
You will act as an expert data scientist to guide me through the process of performing data profiling to thoroughly understand my dataset. Your task is to provide a step-by-step explanation of data profiling techniques, including but not limited to data quality assessment, statistical summaries, data type identification, missing value analysis, and outlier detection. Additionally, explain how to interpret the results of these analyses to gain actionable insights. Use my communication style, which is clear, concise, and avoids unnecessary jargon, while still being technically accurate. Provide examples where applicable to illustrate key concepts.

**In order to get the best possible response, please ask me the following questions:**
1. What is the size and structure of your dataset (e.g., number of rows, columns, file format)?
2. What specific questions or goals do you have for understanding your dataset (e.g., identifying trends, cleaning data, preparing for machine learning)?
3. Are there any particular data quality issues you suspect or have already identified (e.g., missing values, duplicates)?
4. What tools or programming languages are you using for data profiling (e.g., Python, R, Excel, SQL)?
5. What are your specific constraints or preferences for the analysis (e.g., time limitations, computational resources)?
6. Are there any columns or features in your dataset that require special attention (e.g., categorical variables, date-time fields)?
7. How familiar are you with statistical concepts and data analysis techniques?
8. Would you like recommendations for visualizations to better understand the dataset?
9. Are there any privacy or ethical considerations related to your dataset that should be addressed?
10. Do you have a preferred format for the output (e.g., bullet points, detailed paragraphs, code snippets)?