# Statistical Tools Working Group

The goal of this working group is to discuss and work on Statistical Tools issues that come up through our collaborations on the CTSpedia.

# Minutes- Statistical Tools Working Group

## July 7, 2011

### Helping

Babak Shahbaba from UC Irvine joined the call. He is giving R-Workshops at UC Irvine to teach basic R-programming. He has R-code for graphics that he can share with us. Mac Gordon is in charge of the Labs and Liver section of the Statistical Graphics area and can use some help with coding. Most pharmaceutical companies use SAS - so he can use code in any statistical program. Art suggested that we look at the type of graphic or plot rather than specific examples. Sally pointed out that we could use one function for both Figure 01 and 02.

### Organizing the Material

In adding code to the Statistical Graphics we are looking at presentation (e.g. legends) vs functionality (e.g. line plots). Art suggested that we look at utility, statistical coding, and graphing with examples as a sub-category of statistical coding. We want to differentiate R and SAS. Yinglin suggested that we have example SAS code and example graphics. Peter suggested that we differentiate example, function, and macros. We would have R: function and example and SAS: macro and example.

Art has graphics code that he can send. Babak has R-templates that he will send us.

### CDISC and Annotatations

Frank has re-formatted the CDISC data that Mat put on CTSpedia. If we can get access to all the data, we can then develop code for everything. Mac pointed out that we need to beef up descriptions perhaps with a paragraph explaining variables and legends. Frank said that R-attributes contain links to all this information.

### R-Code or R-Functions

Sally pointed out that there is a difference between R-code and R-functions R-code is not a function. Frank said that we are talking about generic function or macros versus editable code. We do need to start with the materials that we have access to and consequently we have more R-code rather than R-functions. As we get access to more complete datasets we can build more functions.

## May 12, 2011

### Working with PhUSE

We spoke with Ben Szilagyi from PhUSE. He would like us to share our work across our two wikis. We will schedule a call with Ben

Ben wrote the following: "Thank you for this update. In principle any page is openly visible for the programmers. They can simply go to www.phuse.com and click the Wiki Icon. From there they can navigate wherever they want to. I can imagine that the Good Programming Practices section could be of interest (e.g. the \x93Robustness\x94 Wiki could be interesting to start with). Direct Link is: http://www.phuse.eu/wiki-robustness.aspx. Else the derivation standards has some interesting topics, such as 'Imputing Partial Dates' or 'Assigning NCI CTC Grades To Laboratory Results' ". I find it easier to register and then take a look around.

### Macro Attachments

Cells are needed to enter output or image, SAS example, Main Macro, Called Macros, Log or output text, Sample Data. We must keep the original URL for all attachments. Putting the information in text is not necessary and takes up a lot of space.

Naming attachments: We will use the name of the school, title of the macro, and a description, e.g. Rochester_Categorize_Main Macro.

### Utility Macros

Utility macros are school specific. We need cells for the actual macro, calls for macro, and output.

### Special Features

We need to accommodate special features - like enhanced output, colors, legend.

### Notes

We need a separate box for notes.

### Checklists

Checklists for the clients should be incorporated into the form.

## October 20, 2010

### PhUSE Meeting

Rui Chen, Kimberly Kaukeinen, and Arthur Watts attended the Pharmaceutical Users Software Exchange (PhUSE) Meeting in Boston. The meeting focused on industry standards for databases, which are not applicable to our work. We were hoping that they would discuss more standards for analysis but right now the group is trying to make sure all data is collected in the same way.

### R-Functions and SAS Macros

The group finds that the R-programs are not as good as the SAS programs. They do not give the same answers.

### Re-organizing the CTSpedia

We are in the process of re-organizing the CTSpedia with boxes summarizing and grouping materials on the home page and not so much text. In terms of the Statistical Tools Sally pointed our that we should be capturing Tools - SAS and Tools - R and then within those groups we should have sub-divisions, e.g. under Tools - SAS we will have SAS Macros, SAS Utility, and SAS Macro Header.

### Naming Conventions

CTSpedia allows one to re-use file attachment names as long as you are in a new topic. Currently, we have 9 SampleRun.sas attachments because they are in different topics, i.e. they use different Macro Header Forms. Art has a set of naming conventions that he has used in his work. He will work with Mary to upload his protocol for naming conventions to the CTSpedia.

## September 14, 2010

### Attendees

Peter Bacchetti, Mary Banach, Rui Chen, Barbara Grimes, Chingshi Jin, Arthur Watts, Mat Soukup, Sally Thurston, Xin Tu, Yinglin Xia

### Introduction - Mat Soukup

Mat is leading the Statistical Graphics collaboration between the FDA, pharmaceutical companies, and CTSpedia. He presented some of his thoughts on important elements of tools that we are sharing. (We will attach Mat's slides here.)

One of the important areas for collaboration will be documenting how code was validated.

### PhUSE Meeting

The European Pharmaceutical Users Software Exchange is meeting in Boston. Mat invited us to attend and three of the Rochester biostatisticians will be attending the meeting. The Europeans are farther along on working with SAS and R-scripts together.

We feel that CTSpedia can really help with the developmental stages of creating code. In this regard Sally pointed out that it is important for us to get together with the PhUSE group early on. Mat said that we will have different intended uses for the code and Peter noted that it is important to have everyone sharing and testing the code, reporting problems including file filenaming difficulties.

### Naming Conventions

We are actually talking about many different aspects of the Statistical Tools site when we are talking about naming conventions. Currently, we have the CTSpedia topic name, which is usually the Title on the Macro Form, Code attachment name, Example attachment name, Output or Image attachment name, and sometimes a sample dataset attachment name.

Peter began this discussion with suggesting that we always use the URL. Art has used a nesting structure for his call names as in the Utility Macros on the CTSpedia. Art and Chengshi Jin use an underline before the first character when they are working on the macro.

Jeff Horner at Vanderbilt has uploaded the tagging plug in from TWiki and we will be able to start tagging, using keywords, and showing cloud tags with size reflecting number of times the tag was used (as Yinglin suggested).

We will return to this discussion and make formal changes to the CTSpedia macros.

### Working Group

We will have a monthly working group with University of Rochester and UCSF programmers to discuss issues on the site, meetings that we have attended, and new collaborations. We will call on Mat when questions come up about our mutual efforts. One of our goals will be to talk to the Statistical Graphics group to see how we can develop SAS code to work with the R-code that they are developing.