Tôi đang cố gắng phát hiện các nốt ung thư phổi bằng các tệp DICOM. Các bước chính trong phát hiện ung thư bao gồm các bước sau.không thể phát sóng mảng đầu vào từ hình dạng (20,310,310) thành hình dạng (20)
1) Preprocessing
* Converting the pixel values to Hounsfield Units (HU)
* Resampling to an isomorphic resolution to remove variance in scanner resolution
*Lung segmentation
2) Training the data set using preprocessed images in Tensorflow CNN
3) Testing and validation
Tôi đã làm theo một vài hướng dẫn trực tuyến để thực hiện việc này.
tôi cần phải kết hợp các giải pháp được đưa ra trong
1) https://www.kaggle.com/gzuidhof/full-preprocessing-tutorial
2) https://www.kaggle.com/sentdex/first-pass-through-data-w-3d-convnet.
Tôi có thể triển khai ví dụ trong liên kết hai. Nhưng vì nó thiếu phân đoạn phổi ok và một vài bước tiền xử lý khác tôi cần phải kết hợp các bước trong liên kết một với liên kết hai. Nhưng tôi nhận được số lỗi trong khi thực hiện nó. Kể từ khi tôi mới để python ai đó có thể xin vui lòng giúp tôi trong việc giải quyết nó.
Có 20 thư mục bệnh nhân và mỗi thư mục bệnh nhân có số lượng lát, đó là tệp dicom.
Đối với phương thức process_data, slices_path của từng bệnh nhân và số bệnh nhân đã được gửi.
def process_data(slices,patient,labels_df,img_px_size,hm_slices):
try:
label=labels_df.get_value(patient,'cancer')
patient_pixels = get_pixels_hu(slices)
segmented_lungs2, spacing = resample(patient_pixels, slices, [1,1,1])
new_slices=[]
segmented_lung = segment_lung_mask(segmented_lungs2, False)
segmented_lungs_fill = segment_lung_mask(segmented_lungs2, True)
segmented_lungs=segmented_lungs_fill-segmented_lung
#This method returns smallest integer not less than x.
chunk_sizes =math.ceil(len(segmented_lungs)/HM_SLICES)
for slice_chunk in chunks(segmented_lungs,chunk_sizes):
slice_chunk=list(map(mean,zip(*slice_chunk))) #list - []
#print (slice_chunk)
new_slices.append(slice_chunk)
print(len(segmented_lungs), len(new_slices))
if len(new_slices)==HM_SLICES-1:
new_slices.append(new_slices[-1])
if len(new_slices)==HM_SLICES-2:
new_slices.append(new_slices[-1])
new_slices.append(new_slices[-1])
if len(new_slices)==HM_SLICES+2:
new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
del new_slices[HM_SLICES]
new_slices[HM_SLICES-1]=new_val
if len(new_slices)==HM_SLICES+1:
new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
del new_slices[HM_SLICES]
new_slices[HM_SLICES-1]=new_val
print('LENGTH ',len(segmented_lungs), len(new_slices))
except Exception as e:
# again, some patients are not labeled, but JIC we still want the error if something
# else is wrong with our code
print(str(e))
#print(len(new_slices))
if label==1: label=np.array([0,1])
elif label==0: label=np.array([1,0])
return np.array(new_slices),label
phương thức Main
# Some constants
#data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'
#patients = os.listdir(data_dir)
#labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)
#patients.sort()
#print (labels_df.head())
much_data=[]
much_data2=[]
for num,patient in enumerate(patients):
if num%100==0:
print (num)
try:
slices = load_scan(data_dir + patients[num])
img_data,label=process_data(slices,patients[num],labels_df,IMG_PX_SIZE,HM_SLICES)
much_data.append([img_data,label])
#much_data2.append([processed,label])
except:
print ('This is unlabeled data')
np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data)
#np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data2)
Phần tiền xử lý hoạt động tốt nhưng khi tôi đang cố gắng để nhập trận chung kết ra đưa vào một Convolutional NN và đào tạo tập dữ liệu, Sau đây là các lỗi tôi nhận bao gồm một số ý kiến mà tôi đã đặt
0
shape hu
(113, 512, 512)
Resize factor
[ 2.49557522 0.6015625 0.6015625 ]
shape
(282, 308, 308)
chunk size
15
282 19
LENGTH 282 20
Tensor("Placeholder:0", dtype=float32)
..........1.........
..........2.........
..........3.........
..........4.........
WARNING:tensorflow:From C:\Research\Python_installation\lib\site-packages\tensorflow\python\util\tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
..........5.........
..........6.........
Epoch 1 completed out of 20 loss: 0
..........7.........
Traceback (most recent call last):
File "C:\Research\LungCancerDetaction\sendbox2.py", line 436, in <module>
train_neural_network(x)
File "C:\Research\LungCancerDetaction\sendbox2.py", line 424, in train_neural_network
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\framework\ops.py", line 606, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\framework\ops.py", line 3928, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\client\session.py", line 789, in run
run_metadata_ptr)
File "C:\Research\Python_installation\lib\site-packages\tensorflow\python\client\session.py", line 968, in _run
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
File "C:\Research\Python_installation\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not broadcast input array from shape (20,310,310) into shape (20)
tôi nghĩ rằng đó là vấn đề với 'segmented_lungs = segmented_lungs_fill-segmented_lung'
nhữngTrong ví dụ làm việc,
segmented_lungs=[cv2.resize(each_slice,(IMG_PX_SIZE,IMG_PX_SIZE)) for each_slice in patient_pixels]
Xin hãy giúp tôi trong việc giải quyết này. Tôi không thể tiếp tục từ một thời gian. Nếu bất cứ điều gì không rõ ràng xin vui lòng cho tôi biết.
Sau đây là toàn bộ mã đã thử.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import dicom
import os
import scipy.ndimage
import matplotlib.pyplot as plt
import cv2
import math
import tensorflow as tf
from skimage import measure, morphology
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
# Some constants
data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'
patients = os.listdir(data_dir)
labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)
patients.sort()
print (labels_df.head())
#Image pixel array watching
for patient in patients[:10]:
#label is to get the label of the patient. This is what done in the .get_value method.
label=labels_df.get_value(patient,'cancer')
path=data_dir+patient
slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
#You have dicom files and they have attributes.
slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
print (len(slices),slices[0].pixel_array.shape)
#If u need to see many slices and resize the large pixelated 2D images into 150*150 pixelated images
IMG_PX_SIZE=50
HM_SLICES=20
for patient in patients[:1]:
#label is to get the label of the patient. This is what done in the .get_value method.
label=labels_df.get_value(patient,'cancer')
path=data_dir+patient
slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
#You have dicom files and they have attributes.
slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
#This shows the pixel arrayed image related to the second slice of each patient
#subplot
fig=plt.figure()
for num,each_slice in enumerate(slices[:16]):
print (num)
y=fig.add_subplot(4,4,num+1)
#down sizing everything. Resize the imag size as their pixel values are 512*512
new_image=cv2.resize(np.array(each_slice.pixel_array),(IMG_PX_SIZE,IMG_PX_SIZE))
y.imshow(new_image)
plt.show()
print (len(patients))
###################################################################################
def get_pixels_hu(slices):
image = np.array([s.pixel_array for s in slices])
# Convert to int16 (from sometimes int16),
# should be possible as values should always be low enough (<32k)
image = image.astype(np.int16)
# Set outside-of-scan pixels to 0
# The intercept is usually -1024, so air is approximately 0
image[image == -2000] = 0
# Convert to Hounsfield units (HU)
for slice_number in range(len(slices)):
intercept = slices[slice_number].RescaleIntercept
slope = slices[slice_number].RescaleSlope
if slope != 1:
image[slice_number] = slope * image[slice_number].astype(np.float64)
image[slice_number] = image[slice_number].astype(np.int16)
image[slice_number] += np.int16(intercept)
return np.array(image, dtype=np.int16)
#The next problem is each patient is got different number of slices . This is a performance issue.
# Take the slices and put that into a list of slices and chunk that list of slices into fixed numer of
#chunk of slices and averaging those chunks.
#yield is like 'return'. It returns a generator
def chunks(l,n):
for i in range(0,len(l),n):
#print ('Inside yield')
#print (i)
yield l[i:i+n]
def mean(l):
return sum(l)/len(l)
def largest_label_volume(im, bg=-1):
vals, counts = np.unique(im, return_counts=True)
counts = counts[vals != bg]
vals = vals[vals != bg]
if len(counts) > 0:
return vals[np.argmax(counts)]
else:
return None
def segment_lung_mask(image, fill_lung_structures=True):
# not actually binary, but 1 and 2.
# 0 is treated as background, which we do not want
binary_image = np.array(image > -320, dtype=np.int8)+1
labels = measure.label(binary_image)
# Pick the pixel in the very corner to determine which label is air.
# Improvement: Pick multiple background labels from around the patient
# More resistant to "trays" on which the patient lays cutting the air
# around the person in half
background_label = labels[0,0,0]
#Fill the air around the person
binary_image[background_label == labels] = 2
# Method of filling the lung structures (that is superior to something like
# morphological closing)
if fill_lung_structures:
# For every slice we determine the largest solid structure
for i, axial_slice in enumerate(binary_image):
axial_slice = axial_slice - 1
labeling = measure.label(axial_slice)
l_max = largest_label_volume(labeling, bg=0)
if l_max is not None: #This slice contains some lung
binary_image[i][labeling != l_max] = 1
binary_image -= 1 #Make the image actual binary
binary_image = 1-binary_image # Invert it, lungs are now 1
# Remove other air pockets insided body
labels = measure.label(binary_image, background=0)
l_max = largest_label_volume(labels, bg=0)
if l_max is not None: # There are air pockets
binary_image[labels != l_max] = 0
return binary_image
#Loading the files
#Load the scans in given folder path
def load_scan(path):
slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
try:
slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
except:
slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
for s in slices:
s.SliceThickness = slice_thickness
return slices
def resample(image, scan, new_spacing=[1,1,1]):
# Determine current pixel spacing
spacing = np.array([scan[0].SliceThickness] + scan[0].PixelSpacing, dtype=np.float32)
resize_factor = spacing/new_spacing
new_real_shape = image.shape * resize_factor
new_shape = np.round(new_real_shape)
real_resize_factor = new_shape/image.shape
new_spacing = spacing/real_resize_factor
print ('Resize factor')
print (real_resize_factor)
image = scipy.ndimage.interpolation.zoom(image, real_resize_factor, mode='nearest')
print ('shape')
print (image.shape)
return image, new_spacing
'''def chunks(l,n):
for i in range(0,len(l),n):
#print ('Inside yield')
#print (i)
yield l[i:i+n]
def mean(l):
return sum(l)/len(l)'''
#processing data
def process_data(slices,patient,labels_df,img_px_size,hm_slices):
#for patient in patients[:10]:
#label is to get the label of the patient. This is what done in the .get_value method.
try:
label=labels_df.get_value(patient,'cancer')
print ('label process data')
print (label)
#path=data_dir+patient
#slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
#You have dicom files and they have attributes.
slices.sort(key = lambda x: float(x.ImagePositionPatient[2]))
#This shows the pixel arrayed image related to the second slice of each patient
patient_pixels = get_pixels_hu(slices)
print ('shape hu')
print (patient_pixels.shape)
segmented_lungs2, spacing = resample(patient_pixels, slices, [1,1,1])
#print ('Pix shape')
#print (segmented_lungs2.shape)
#segmented_lungs=np.array(segmented_lungs2).tolist()
new_slices=[]
segmented_lung = segment_lung_mask(segmented_lungs2, False)
segmented_lungs_fill = segment_lung_mask(segmented_lungs2, True)
segmented_lungs=segmented_lungs_fill-segmented_lung
#print ('length of segmented lungs')
#print (len(segmented_lungs))
#print ('Shape of segmented lungs......................................')
#print (segmented_lungs.shape)
#print ('hiiii')
#segmented_lungs=[cv2.resize(each_slice,(IMG_PX_SIZE,IMG_PX_SIZE)) for each_slice in segmented_lungs3]
#print ('bye')
#print ('length of slices')
#print (len(slices))
#print ('shape of slices')
#print (slices.shape)
#print (each_slice.pixel_array)
#This method returns smallest integer not less than x.
chunk_sizes =math.ceil(len(segmented_lungs)/HM_SLICES)
print ('chunk size ')
print (chunk_sizes)
for slice_chunk in chunks(segmented_lungs,chunk_sizes):
slice_chunk=list(map(mean,zip(*slice_chunk))) #list - []
#print (slice_chunk)
new_slices.append(slice_chunk)
print(len(segmented_lungs), len(new_slices))
if len(new_slices)==HM_SLICES-1:
new_slices.append(new_slices[-1])
if len(new_slices)==HM_SLICES-2:
new_slices.append(new_slices[-1])
new_slices.append(new_slices[-1])
if len(new_slices)==HM_SLICES-3:
new_slices.append(new_slices[-1])
new_slices.append(new_slices[-1])
new_slices.append(new_slices[-1])
if len(new_slices)==HM_SLICES+2:
new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
del new_slices[HM_SLICES]
new_slices[HM_SLICES-1]=new_val
if len(new_slices)==HM_SLICES+1:
new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
del new_slices[HM_SLICES]
new_slices[HM_SLICES-1]=new_val
if len(new_slices)==HM_SLICES+3:
new_val =list(map(mean, zip(*[new_slices[HM_SLICES-1],new_slices[HM_SLICES],])))
del new_slices[HM_SLICES]
new_slices[HM_SLICES-1]=new_val
print('LENGTH ',len(segmented_lungs), len(new_slices))
except Exception as e:
# again, some patients are not labeled, but JIC we still want the error if something
# else is wrong with our code
print(str(e))
#print(len(new_slices))
if label==1: label=np.array([0,1])
elif label==0: label=np.array([1,0])
return np.array(new_slices),label
# Some constants
#data_dir = '../../CT_SCAN_IMAGE_SET/IMAGES/'
#patients = os.listdir(data_dir)
#labels_df=pd.read_csv('../../CT_SCAN_IMAGE_SET/stage1_labels.csv',index_col=0)
#patients.sort()
#print (labels_df.head())
much_data=[]
much_data2=[]
for num,patient in enumerate(patients):
if num%100==0:
print (num)
try:
slices = load_scan(data_dir + patients[num])
img_data,label=process_data(slices,patients[num],labels_df,IMG_PX_SIZE,HM_SLICES)
much_data.append([img_data,label])
#much_data2.append([processed,label])
except:
print ('This is unlabeled data')
np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data)
#np.save('muchdata-{}-{}-{}.npy'.format(IMG_PX_SIZE,IMG_PX_SIZE,HM_SLICES),much_data2)
IMG_SIZE_PX = 50
SLICE_COUNT = 20
n_classes=2
batch_size=10
x = tf.placeholder('float')
y = tf.placeholder('float')
keep_rate = 0.8
def conv3d(x, W):
return tf.nn.conv3d(x, W, strides=[1,1,1,1,1], padding='SAME')
def maxpool3d(x):
# size of window movement of window as you slide about
return tf.nn.max_pool3d(x, ksize=[1,2,2,2,1], strides=[1,2,2,2,1], padding='SAME')
def convolutional_neural_network(x):
# # 5 x 5 x 5 patches, 1 channel, 32 features to compute.
weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,3,1,32])),
# 5 x 5 x 5 patches, 32 channels, 64 features to compute.
'W_conv2':tf.Variable(tf.random_normal([3,3,3,32,64])),
# 64 features
'W_fc':tf.Variable(tf.random_normal([54080,1024])),
'out':tf.Variable(tf.random_normal([1024, n_classes]))}
biases = {'b_conv1':tf.Variable(tf.random_normal([32])),
'b_conv2':tf.Variable(tf.random_normal([64])),
'b_fc':tf.Variable(tf.random_normal([1024])),
'out':tf.Variable(tf.random_normal([n_classes]))}
# image X image Y image Z
x = tf.reshape(x, shape=[-1, IMG_SIZE_PX, IMG_SIZE_PX, SLICE_COUNT, 1])
conv1 = tf.nn.relu(conv3d(x, weights['W_conv1']) + biases['b_conv1'])
conv1 = maxpool3d(conv1)
conv2 = tf.nn.relu(conv3d(conv1, weights['W_conv2']) + biases['b_conv2'])
conv2 = maxpool3d(conv2)
fc = tf.reshape(conv2,[-1, 54080])
fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc'])
fc = tf.nn.dropout(fc, keep_rate)
output = tf.matmul(fc, weights['out'])+biases['out']
return output
much_data = np.load('muchdata-50-50-20.npy')
# If you are working with the basic sample data, use maybe 2 instead of 100 here... you don't have enough data to really do this
train_data = much_data[:-4]
validation_data = much_data[-4:]
def train_neural_network(x):
print ('..........1.........')
prediction = convolutional_neural_network(x)
print ('..........2.........')
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction,y))
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
print ('..........3.........')
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)
print ('..........4.........')
hm_epochs = 20
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
successful_runs = 0
total_runs = 0
print ('..........5.........')
for epoch in range(hm_epochs):
epoch_loss = 0
for data in train_data:
total_runs += 1
try:
X = data[0]
Y = data[1]
_, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y})
epoch_loss += c
successful_runs += 1
except Exception as e:
# I am passing for the sake of notebook space, but we are getting 1 shaping issue from one
# input tensor. Not sure why, will have to look into it. Guessing it's
# one of the depths that doesn't come to 20.
pass
#print(str(e))
print ('..........6.........')
print('Epoch', epoch+1, 'completed out of',hm_epochs,'loss:',epoch_loss)
print ('..........7.........')
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
print('Done. Finishing accuracy:')
print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]}))
print('fitment percent:',successful_runs/total_runs)
print (x)
# Run this locally:
train_neural_network(x)
PS: resample(), segment_lung_mask() phương pháp có thể được tìm thấy từ liên kết 1.
Vui lòng bao gồm tất cả mã của bạn. Đặc biệt, chúng tôi không thể nhìn thấy bản in đang gây ra lỗi của bạn. Mặc dù nó chắc chắn trông giống như 'x: [i [0] cho i trong validation_data]' đang cho ăn quá nhiều dữ liệu cùng một lúc. – Stephen
Xin chào Stephen .. Tôi đã bao gồm mã. – user3789200
Điều gì sẽ xảy ra nếu bạn chỉ chạy 'np.array (x)' và 'np.array (y)' mà không đi qua TensorFlow? Tôi nghi ngờ thông tin hình dạng Numpy không hợp lệ. –