ChatGPT: Zero to Hero - Lewis Does Data

Tell Me About Generative AI and ChatGPT

Generative artificial intelligence (AI) has seen a meteoric rise over the last year or so, being heralded as major technological breakthrough. This transformative domain of artificial intelligence has fundamentally altered the way we interact with and harness AI capabilities, reshaping everything from answering questions to generating creative content.

The essence of generative AI lies in its ability to produce new content by identifying patterns in existing data. ChatGPT stands at the forefront in this arena and in public consciousness. Its remarkable aptitude for understanding, interpreting, and responding to natural language queries opens the door to a plethora of applications, extending far beyond the realm of conventional chatbots.

With ChatGPT, the possibility of generating contextually relevant language, summarising information, and even performing creative tasks has now become accessible to a broad spectrum of users, and its ability to recall past interactions and make contextually accurate responses further amplifies this utility.

In this post, I am going to provide an overview of how to use ChatGPT, a framework for deciding on whether a task represents a genuine use-case, some important ethical and legal considerations to take into account when using generative AI, and a bunch of great tips and tricks for supercharging your ChatGPT queries.

ChatGPT Workflow Essentials

The ChatGPT workflow can be condensed into three main steps:

User provides input
The input is processed by the underpinning large language model (LLM)
The generated response is provided to the user

Now, I could get technical at this point and emphasise how the LLMs at the core of models like ChatGPT leverage intricate and complex algorithms for language pattern and structure recognition, and how they use this to interpret the input and generate coherent, meaningful responses. Instead, I’m telling you that in essence, you put something in, some magic happens, and you get something out.

Part of the reason for this slight over-simplification is to emphasise that, although natural language processing (NLP) and techniques like byte-pair encoding are fascinating topics, we don’t need to understand the inner workings of the ChatGPT LLM to use it effectively.

However, this apparently simple user experience shouldn’t belie the absolute requirement for high quality input to get the best quality output in a timely manner. I am paraphrasing a little, but I think George Fuechsel first coined the expression that captures the sentiment of this point the best.

“If you put sh*t in, you will always get sh*t out!” — George Fuechsel

Along with a foundational understanding of prompt engineering, we also need to have an appreciation of when and why we should consider using ChatGPT and generative AI, and when and why we definitely shouldn’t.

Let’s look at each of these topics in more detail.

Tips For Writing Effective ChatGPT Prompts

In all levels of education, we were taught that all questions are valuable. While I agree with the general sentiment of this statement in this context, there definitely are stupid questions. This is even more true when it comes to ChatGPT. Remember what George Fuechsel said.

Instead of going round and round receiving poor quality answers and engaging in extended exchanges to get out what you wanted, it is well advised to invest some time in crafting effective prompts. Here’s a quick list of my top 10 tips for how to do that.

Tip 1: Define Your Objective

Before typing your query, take a moment to define your goal or objective in context. Be clear about what you want to understand or learn as having a clear objective will guide your query.

Tip 2: Use Clear and Specific Language

Clarity is essential so use clear and specific language to convey your questions. Avoid technical jargon or convoluted sentences that might hinder comprehension.

Tip 3: Keep It Concise

Aim for brevity in your prompts. Succinct questions can lead to more focused and informative responses.

Tip 4: Use Proper Grammar and Punctuation

Use proper grammar and punctuation. Well-constructed sentences and correct coding syntax if running code-related queries aid comprehension.

Tip 5: Ask One Question at a Time

For clear and informative responses, focus on one aspect of a concept in each query. If you have multiple queries, break them down into separate questions.

Tip 6: Provide Context

When seeking explanations around a specific topic, provide context to help the model understand your level of familiarity with the topic or the depth that you want to go into. If troubleshooting, try to aid the model in understanding the specific challenges you’re facing. Running multiple successive queries to build up context can really help here.

Tip 7: Specify Format

If you have a particular format in mind for the explanation or output, specify it. This helps the model tailor its response to your preferences. A good example when coding would be to ask for explanations with specific code to illustrate the concepts being explained.

Tip 8: Use Follow-up Questions

Don’t hesitate to ask follow-up questions if the initial explanation doesn’t fully clarify your understanding. Reference the model’s previous response to seek additional insights.

Tip 9: Experiment and Refine

If you’re not getting the desired level of clarity in the explanation, experiment with different phrasings or approaches. Iteration can lead to more informative responses.

Tip 10: Check for Understanding

After receiving a response or an explanation, take a moment to ensure that the model has grasped what you asked and that you grasp the explanation provided correctly. If there’s any confusion, ask for clarifications or provide specific instructions for further guidance.

I have kept these pointers general and not provided specific examples due to the sheer breadth of potential applications and scenarios within which they could be placed when using ChatGPT.

I will be producing specific guides for how to use these pointers in the technical context of programming with scripting languages such as R, Python and BASh in future, so keep an eye out for those.

Embrace The Revolution…

The potential applications and implications of tools like ChatGPT are vast.

One of ChatGPT’s notable strengths lies in its ability to summarise complex concepts and text, making it invaluable for tasks such as simplifying intricate or technical information, and for summarising lengthy reports and documents to a format suitable for a diverse array of audiences.

For example, we could ask for the following simplified summary:

Click to Zoom

ChatGPT maintains context throughout the conversation, allowing users to make follow-up corrections and benefit from its contextual understanding. We can leverage this here to easily request a more concise version of the initial summary:

Click to Zoom

We could also have given it some text and asked it to summarise this for us. To do this, first make the initial request and confirm that ChatGPT has understood what we are trying to do:

Click to Zoom

You can the provide the text that you want it to summarise or simplify. In this case, I gave it the overview of parallel computing from Wikipedia which can be found here.

Click to Zoom

The output is a condensed version of the information it was given. Pretty cool.

Click to Zoom

As well as providing summaries or condensing information, ChatGPT is also highly proficient at creative tasks making it a valuable resource for streamlining the process of things like writing email templates, assisting in copyediting, and generating blog posts (not guilty - this is all my own blood, sweat and tears). By incorporating ChatGPT into these types of tasks and creative workflows, people and organisations can streamline processes and save both time and resources. The model’s efficiency in handling repetitive tasks allows human experts to focus on more technical, intricate and strategic responsibilities.

Here’s an example. A typical workflow of summarising a multi-page report for your boss or a group of project stakeholders would usually involve considerable manual effort to undertake tedious and error-prone tasks such as document scanning, extraction of key findings, summary compilation, and final proofreading. The analogous ChatGPT-augmented workflow would involve constructing a well-engineered prompt, interact with ChatGPT, and finally proofread the output summary for accuracy. This shift significantly reduces manual tasks, saving time and improving the overall quality of the outcome.

ChatGPT’s impact extends to all industries and roles, though some specific considerations are necessary in some cases; we will cover these shortly.

In the context of software engineers, data professionals, and other technical roles, ChatGPT can expedite tasks such as code-related research, documentation, and problem-solving, enhancing productivity and efficiency. It can also, if used wisely, help with tasks like generating code templates, providing explanations for code and errors, and offering suggestions for improvements. Note the cautious language here.

“ChatGPT can supercharge your workflow if you understand a topic well or it can leave you exposed for not knowing what you’re doing” — Kyle E. Walker, Ph.D.

ChatGPT is still far from perfect for these types of applications and so should be used with caution. In my experience, the code ChatGPT produces is often very cluncky, and sometimes it is out and out wrong. Ask ChatGPT for an explanation of a nuanced concept like anonymous functions and you will see what I mean.

Stackoverflow, you’re still safe for now!

…But Tread Carefully

Although ChatGPT is a valuable tool that can perform a huge variety of tasks, there are some key limitations and considerations to be aware of to use it effectively and safely.

When developing ChatGPT, the underpinning LLM was shown a huge amount of text data (training data) from a wide variety of sources. Using this data set, the LLM built its understanding of the structure of language by looking at the frequency and order of words. This process of training was and is continually being fine-tuned via an iterative process that accounts for the quality rating of ChatGPT’s responses. The sheer amount and variety of the data used to train ChatGPT is a large part of its success, but it is also the predominant reason for the key limitations that users should be aware of.

Knowledge Cutoff

ChatGPT was trained on data that stops at a specific date (currently September 2021). As the model isn’t connected to the internet or other external sources, so it isn’t aware of events beyond this date.

Training Data Bias

ChatGPT was trained on a massive text dataset from a variety of sources, including books, articles, and websites, some of which have led to the model learning biases inherent to their content, meaning that it can produce biased responses. Left-wing political and gender bias are just a couple that have been uncovered by researchers so far.

Context Tracking

As we have seen already, ChatGPT can retain and utilise conversational context. However, if the topic of the conversation shifts multiple times, it can struggle to keep track, and may generate inaccurate or irrelevant responses as a consequence.

A good rule of thumb to prevent this is to keep a conversation to one topic and create new conversations for different topics.

Hallucination

This is when the model confidently tells us inaccurate information and often occurs when attempting to go beyond the knowledge cutoff or capabilities of ChatGPT. The anonymous function example I cited earlier is a good example of this; if I hadn’t known better, I would have thought that it was 100% certain that the information provided was accurate.

I have a handy trick to help with this later.

Legal and Ethical Considerations

It’s easy to fall into one of several legal and ethical grey areas if the use cases for ChatGPT aren’t properly scoped so that ownership and legal implications are well-understood and accepted. We will cover these in a bit more detail next, as they are key considerations when identifying use cases for ChatGPT.

Identifying Use-Cases

While ChatGPT offers a broad range of capabilities, it’s crucial to assess the suitability of the tool for specific use cases through consideration of the following factors.

Accuracy Requirement

ChatGPT’s responses may not always be completely accurate and can be unpredictable. It is prudent, therefore, to determine whether a prospective use case demands a high degree of response accuracy. Instances that require certainty in the quality of the response, such as policy advisory, should almost certainly not involve the use of ChatGPT.

Quality Validation

Even in less stringent circumstances than policy advisory, it is always advisable to ensure that someone suitably qualified is available to review and validate the quality of responses. Expert oversight is essential when precision is paramount, and it would, at best, be irresponsible to begin using unvalidated responses to drive decision-making, and at worst, have potential legal consequences.

Sensitive Data Handling

It is certainly problematic to use ChatGPT where sensitive data is involved, but not impossible. The critical issues here are that data governance laws (e.g., GDPR) must be adhered to and that appropriate consent to process or use the data in question is granted.

If sensitive data is to be used, it is strongly advisable to seek prior legal counsel from someone who specialises in data governance.

Ownership Considerations

These are important if the intention is to generate revenue from the response. Like when using sensitive data, ownership issues can make using ChatGPT problematic but not impossible, provided that users comply with the OpenAI terms of use and that copyright infringement and plagiarism risks are minimised.

As with using sensitive data, it would be strongly advisable to seek legal counsel from someone who specialises in copyright and intellectual property law.

Rather than end on this cautionary note, let’s look at some more advanced ChatGPT user tips to supercharge your ChatGPT interactions.

Advanced ChatGPT

When I say advanced, I’m not talking API’s or backend programmatic wizardry; these tips still relate to the web based ChatGPT platform but will augment the fundamentals listed earlier to really elevate your ChatGPT interactions to the next level.

Advanced Tip 1 - Customise Instructions

To optimise your interactions with ChatGPT, utilise custom instructions. You can find this option by clicking on your profile icon:

Click to Zoom

When you click on each of the boxes available for providing custom instructions, some handy thought starters will appear to help you with what to include. Make sure that you provide details that gives some background context and will help ChatGPT deliver better responses.

The specific information you provide will depend on what you’d like ChatGPT to know about you to improve and personalise outputs. If you’re a data professional, a teacher, or a journalist, for example, inform ChatGPT of your profession. If you want responses relevant to your region, provide a general location without disclosing precise details. This customisation ensures responses are well aligned with your background.

Click to Zoom

In the second of the custom instruction boxes, you can instruct ChatGPT on tone, opinion stance, and response length. This minimises the need for repetitive exchanges, resulting in more efficient interactions.

For instance, if you’re an R user who requires concise code, you can instruct ChatGPT: “When I request R code, provide the most efficient code with snippets and no additional explanations”.

Click to Zoom

Another great inclusion here is an instruction regarding the confidence level for factual topics; instruct ChatGPT to report its confidence level in these instances. You can also stipulate that it should provide valid sources and inform you when the response includes conjecture. These measures ensure the quality of the information returned and act to combat those hallucinations I spoke about earlier.

Click to Zoom

Try running a query once you have saved these custom settings to make sure that they have worked:

Click to Zoom

Advanced Tip 2 - Style Engineering

This is one of my favourite power-user tips.

If you require ChatGPT to produce content in your personal style, such as when constructing email templates, tweets (are they still called tweets? “Xs” doesn’t sound right) or blog posts, you can train ChatGPT to do this by using examples.

Begin by explaining your intentions:

Click to Zoom

You can use this example as a general guideline, replacing “data science blog” with the type of content you wish to generate and the temporary variable “LQ_STYLE” with one for your own style.

Next provide examples from which ChatGPT can learn. By doing this, ChatGPT will grasp your style through examples, making it easier to produce content that replicates your writing style.

Click to Zoom

Here, I copied the introduction paragraphs from my recent Tidyverse Tips & Tricks and Introduction to ggplot2 posts.

ChatGPT will confirm that it has learned from the examples and provide a short analysis of the writing style. I think the comments it gave for my couple of examples are pretty flattering, even if I do say so myself.

Click to Zoom

Finally, follow the prompt that ChatGPT provided for a topic or content to use for composing the requested output:

Click to Zoom

Not a bad for 5 minutes work. Needs a bit of editing but let’s see if we can get ChatGPT to do a bit more of the heavy lifting first.

Advanced Tip 3 - Self-Critique

If you receive a response that isn’t quite satisfactory, you can ask ChatGPT to be critically review its own responses, provide feedback and revise the content:

Click to Zoom

The content here is much better than the first attempt. Obviously, I am not going to manually integrate these changes:

Click to Zoom

Not bad. Even with personalisation, the analogies lack that niche 80s and 90s pop culture flavour that only years of being a total nerd can bring, but this is a great start for minimal effort. I might even use this in future 🤔

Advanced Tip 4 - Self-Prompting

Speaking of minimal effort, what about getting ChatGPT to engineer and optimise its own prompts?

To do this, we can request that ChatGPT generate a set of ideal prompts for a specific task. It will usually return a series of questions about the topic, target audience and specific requirements:

Click to Zoom

Provide very brief answers to these in order and provide the respective question numbers to maintain context. The output should provide you with a series of possible prompts that you can use achieve the best results from ChatGPT.

Click to Zoom

Rather than just using one, you can actually get ChatGPT to critically analyse these prompts and provide insight into why each might be effective. All very meta!

Click to Zoom

Let’s try and use one of these and see what it comes up with:

Click to Zoom

We can see that it has produced a post with a casual tone that uses a lot of buzzwords and emojis to grab the attention of the reader. It is aimed at the target audience that we provided, and it does explain what the main topic and selling points of the article are, and how reading the post will be beneficial.

Pretty cool, eh?

Advanced Tip 5 - Specify Output Length and Format

This final tip feels like a bit of an anti-climax after the last two but still a super handy one to know none the less; you can customise the output to suit your needs.

First, you can specify a maximum word count in your prompts to avoid overly lengthy responses. Here, I have gone with a very slight modification on the earlier example where I provided the overview of parallel computing from Wikipedia and asked ChatGPT to summarise it for me, only this time I have specified that the output summary shouldn’t exceed 150 words in length.

Click to Zoom

The output it returned was 124 words long. Nice and simple.

Something that most people don’t realise is that you can also ask ChatGPT for output in a format other than plain text, including tables, CSV, HTML, JSON, XML, and even data frames for your favourite data wrangling packages like pandas.

To demonstrate this capability, I have asked ChatGPT for some information that would lend itself to tabulation:

Click to Zoom

I have then asked it to present this information in a tabular format in the HTML language so that it could be dropped straight into any web page or document rendered in HTML. I have also specifically asked for it to name the variables in the table:

Click to Zoom

Now, let’s just copy and paste the output into a HTML viewer to check that it has worked in the intended manner:

Click to Zoom

Voilà! Great little time-saver.

Note that the HTML document setup code shown here was already in place on html.onlineviewer.net and I simply pasted the ChatGPT output below starting on line 26.

Summarise This Post

Generative AI such as ChatGPT holds the promise of revolutionising how we interact with, utilise, and benefit from artificial intelligence. Its profound impact on productivity, creative expression, and knowledge dissemination is nothing short of extraordinary.

To make the most of tools like ChatGPT you must focus on input quality because, as what I am calling “Fuechsel’s law” states, sh*t in means sh*t out. By understanding a topic well, following the information I have presented here, and provide clear instructions, you can harness the power of ChatGPT effectively and efficiently to supercharge your workflows.

But remember that with great power comes great responsibility. Knowledge cutoff, potential bias in training data and challenges in maintaining context in multifaceted conversations necessitate careful handling. Ethical and legal considerations, such as plagiarism, academic malpractice, intellectual property rights and data privacy, are crucial factors when using generative AI and ChatGPT, and doing so with a lack of domain knowledge can and will leave you exposed when interacting with those who deeply understand a subject.

. . . . .

Thanks for reading. I hope you enjoyed the article and that it helps you to get a job done more quickly or inspires you to further your data science journey. Please do let me know if there’s anything you want me to cover in future posts.

If this tutorials has helped you, consider buying me a coffee on Ko-fi!

Happy Data Analysis!

. . . . .

Disclaimer: All views expressed on this site are exclusively my own and do not represent the opinions of any entity whatsoever with which I have been, am now or will be affiliated.

← Previous Post Next Post →