2023-02-07

Using chatGPT in the teaching of R

I am currently focalising an introductory course in programming with R within my organisation. We use datacamp.com as a resource and meet online once a week to share.

This week my input was an interaction with chatGPT, see below. My impression is that AI can be quite helpful in learning a programming language. Especially if you can ask good questions. In the future, as a course organiser, the focus will be less on the content and more on being together as a learning group and sharing.

Also worth reading on the topic of chatGPT is a discussion thread on datamethods.org.


Here is the AI interaction:

I want to extract the first five characters of a string using R. How would the code look like?

You can extract the first five characters of a string in R using the substring function. Here is an > example:

string <- "Hello World"
result <- substring(string, 1, 5)
print(result)

This code will return the following output:

[1] "Hello"

Explain how vectorisation works in R!

In R, vectorization is the process of performing operations on entire arrays of data, rather than iterating over the elements one by one. This is important because vectorized operations are typically much faster than non-vectorized operations, due to the optimized underlying code.

The key to vectorization in R is that all basic data structures in R are arrays, or vectors. This means that operations performed on them are automatically vectorized. For example, if you add two vectors of the same length, the result is a new vector where each element is the sum of the corresponding elements from the input vectors.

Here is an example to illustrate this:

a <- c(1, 2, 3, 4, 5)
b <- c(5, 4, 3, 2, 1)
c <- a + b
print(c)

This code will return the following output:

[1] 6 6 6 6 6

In this example, the + operator is applied to the entire vectors a and b, producing the new vector c that contains the sums of corresponding elements.

It’s worth noting that not all operations in R are vectorized, and some operations can only be vectorized under certain conditions. For example, some operations require that all elements in a vector have the same length and structure, and some operations may return unexpected results if applied to vectors with different lengths.

Give me another example!

Sure, here’s another example that demonstrates vectorization in R:

x <- c(1, 2, 3, 4, 5)
y <- x * 2
print(y)

In this code, we create a vector x with 5 elements, and then use the * operator to multiply each element in x by 2. The result is a new vector y that contains the doubled values of the elements in x.

[1]  2  4  6  8 10

This demonstrates how vectorization allows us to perform operations on entire arrays of data, without having to loop over the elements one by one. This is a more efficient and concise way to work with data in R.

What are the most important string functions in base R?

Here are some of the most commonly used string functions in base R:

  1. nchar: returns the number of characters in a string
  2. substring: extracts a substring from a string
  3. paste: concatenates strings into a single string
  4. toupper: converts all characters in a string to uppercase
  5. tolower: converts all characters in a string to lowercase
  6. gsub: replaces all occurrences of a pattern in a string with a replacement string
  7. grep: searches for a pattern in a string and returns the matching elements
  8. grepl: similar to grep, but returns a logical vector indicating which elements match the pattern
  9. strsplit: splits a string into substrings based on a specified separator
  10. trimws: removes leading and trailing whitespace from a string.

These functions form the basis for many common string manipulation tasks in R, and are commonly used in data cleaning and preparation. It’s worth noting that there are many more string functions available in R, and many specialized string manipulation packages available in the R ecosystem.

Translate excel string functions to their corresponding R functions!

Here are some common Excel string functions and their equivalent R functions:

Excel function R function
LEFT substring
RIGHT substring
MID substring
LEN nchar
UPPER toupper
LOWER tolower
CONCATENATE paste
REPLACE gsub
SEARCH grep or grepl

Note that the R functions are not always a direct match to their Excel counterparts, but they can be used to achieve the same results. For example, substring can be used to implement the functionality of the LEFT, RIGHT, and MID functions by specifying different arguments for the start and end positions of the desired substring.

3 comments:

Anonymous said...

ChatGPT is good at regular expression related things and some basic R questions. However, it fails miserably in most R flows that cannot be done in one line. I'm quite fed up with "Yes, you are correct... Here is a corrected code..." ... which still has fundamental errors. Not the quickest way to solve those R problems. Less R code available for training it?

Karsten W. said...

When the AI suggests wrong code again and again, that is a No-go for me. Hopefully, some competition will rectify that.

StatistikinDD said...

Nice post. Yeah, ChatGPT can do amazing things, but not reliably yet. Will be very interesting to see how it develops. I'm a bit on the skeptical side, as it does not seem to have proper language understanding.